This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Last update: Mar 24, 2022

Overview

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Requirements

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Download checkpoints

Download the vocabulary file of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the pre-trained checkpoint of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the 2nd general distillation checkpoint of TinyBERT from HERE, and extract them into ./pretrained_ckpt/.

Prepare dataset

Download the GLUE dataset (containing MNLI) using the script in HERE, and put the files into ./dataset/glue/. Download the Amazon Reviews dataset from HERE, and extract it into ./dataset/amazon_review/

Train the teacher model (BERT$_{\rm B}$-single) from single-domain

bash train_domain.sh

Distill the student model (BERT$_{\rm S}$) with TinyBERT-KD from single-domain

bash finetune_domain.sh

Train the teacher model (HRKD-teacher) from multi-domain

bash train_multi_domain.sh

And then put the checkpoints to the specified directories (see the beginning of finetune_multi_domain.py for more details).

Distill the student model (BERT$_{\rm S}$) with our HRKD from multi-domain

bash finetune_multi_domain.sh

Reference

If you find this code helpful for your research, please cite the following paper.

@inproceedings{dong2021hrkd,
  title     = {{HRKD}: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression},
  author    = {Chenhe Dong and Yaliang Li and Ying Shen and Minghui Qiu},
  booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year      = {2021}
}

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Related tags

Overview

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

Requirements

Download checkpoints

Prepare dataset

Train the teacher model (BERT$_{\rm B}$-single) from single-domain

Distill the student model (BERT$_{\rm S}$) with TinyBERT-KD from single-domain

Train the teacher model (HRKD-teacher) from multi-domain

Distill the student model (BERT$_{\rm S}$) with our HRKD from multi-domain

Reference

Owner

Chenhe Dong

This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Public implementation of the Convolutional Motif Kernel Network (CMKN) architecture

A unet implementation for Image semantic segmentation

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Hierarchical Attentive Recurrent Tracking

Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks

High-quality implementations of standard and SOTA methods on a variety of tasks.

Computer Vision and Pattern Recognition, NUS CS4243, 2022

🏖 Keras Implementation of Painting outside the box

Tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation

Outlier Exposure with Confidence Control for Out-of-Distribution Detection

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

Image Matching Evaluation

Reproduces the results of the paper "Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations".

Azion the best solution of Edge Computing in the world.

MaRS - a recursive filtering framework that allows for truly modular multi-sensor integration

Convolutional neural network that analyzes self-generated images in a variety of languages to find etymological similarities

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.