Knowledge-Inheritance

Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq format) can be downloaded from Tsinghua Cloud. You can use convert_fairseq_to_huggingface.py to convert the Fairseq format into Huggingface's transformers format easily.

We refer the downstream performance evaluation to the implementation of Fairseq (GLUE tasks) and Don't Stop Pre-training (ACL-ARC / CHEMPROT).

If you have any question, feel free to contact us ([email protected]).

1. Available Pretrained Models

WB domain: Wikipedia + BookCorpus; CS domain: computer science papers; BIO domain: biomedical papers;

Models trained by self-learning

RoBERTa_WB_H_4
RoBERTa_WB_H_6
RoBERTa_WB_H_8
RoBERTa_WB_H_10
RoBERTa_WB_D_288
RoBERTa_WB_D_384
RoBERTa_WB_D_480
RoBERTa_WB_D_576
RoBERTa_WB_D_672
RoBERTa_WB_BASE
RoBERTa_WB_MEDIUM
RoBERTa_WB_BASE_PLUS
RoBERTa_WB_LARGE
GPT_WB_MEDIUM
GPT_WB_BASE
GPT_WB_BASE_PLUS
RoBERTa_CS_MEDIUM
RoBERTa_CS_BASE
RoBERTa_BIO_MEDIUM
RoBERTa_BIO_BASE

Models trained by Knowledge Inheritance

RoBERTa_WB_BASE -> RoBERTa_WB_BASE_PLUS
RoBERTa_WB_BASE -> RoBERTa_WB_LARGE
RoBERTa_WB_BASE_PLUS -> RoBERTa_WB_LARGE
RoBERTa_WB_BASE -> RoBERTa_WB_BASE_PLUS -> RoBERTa_WB_LARGE

Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Related tags

Overview

Knowledge-Inheritance

1. Available Pretrained Models

Models trained by self-learning

Models trained by Knowledge Inheritance

Owner

THUNLP

Reimplementation of the paper "Attention, Learn to Solve Routing Problems!" in jax/flax.

PyTorch implementation of SIFT descriptor

Medical image analysis framework merging ANTsPy and deep learning

(to be released) [NeurIPS'21] Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs

Official PyTorch Implementation of Hypercorrelation Squeeze for Few-Shot Segmentation, arXiv 2021

Random Erasing Data Augmentation. Experiments on CIFAR10, CIFAR100 and Fashion-MNIST

Official PyTorch Implementation of "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting".

Generate text captions for images from their CLIP embeddings. Includes PyTorch model code and example training script.

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

ANN model for prediction a spatio-temporal distribution of supercooled liquid in mixed-phase clouds using Doppler cloud radar spectra.

Tools for computational pathology

This is official implementaion of paper "Token Shift Transformer for Video Classification".

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PyTorch implementation for Convolutional Networks with Adaptive Inference Graphs

DFM: A Performance Baseline for Deep Feature Matching

Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers

Graph neural network message passing reframed as a Transformer with local attention

DRIFT is a tool for Diachronic Analysis of Scientific Literature.

A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising (CVPR 2020 Oral & TPAMI 2021)

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation