Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Last update: Dec 29, 2022

Related tags

Overview

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Code repo for paper Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations.

Dependencies

torch=1.8.1
transformers=4.9.0
sentence-transformers=2.0.0

Please view `requirements.txt' for more details.

Train

Self-distillation:

>> bash train_self_distill.sh 0

0 denotes GPU device index.

Mutual-distillation (two GPUs needed):

>> bash train_mutual_distill.sh 1,2

Train with your custom corpus:

>> CUDA_VISIBLE_DEVICES=0,1 python src/mutual_distill_parallel.py \
         --batch_size_bi_encoder 128 \
         --batch_size_cross_encoder 64 \
         --num_epochs_bi_encoder 10 \
         --num_epochs_cross_encoder 1 \
         --cycle 3 \
         --bi_encoder1_pooling_mode cls \
         --bi_encoder2_pooling_mode cls \
         --init_with_new_models \
         --task custom \
         --random_seed 2021 \
         --custom_corpus_path CORPUS_PATH

CORPUS_PATH should point to your custom corpus in which every line should be a sentence pair in the form of sent1||sent2.

Evaluate

>> python src/eval.py

Authors

Fangyu Liu: Main contributor

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Related tags

Overview

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Dependencies

Train

Evaluate

Authors

Security

License

Owner

Amazon

Automatic learning-rate scheduler

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Semi-supervised Implicit Scene Completion from Sparse LiDAR

Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space

Code for pre-training CharacterBERT models (as well as BERT models).

joint detection and semantic segmentation, based on ultralytics/yolov5,

OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion.

Goal of the project : Detecting Temporal Boundaries in Sign Language videos

Plato: A New Framework for Federated Learning Research

Covid-19 Test AI (Deep Learning - NNs) Software. Accuracy is the %96.5, loss is the 0.09 :)

A transformer model to predict pathogenic mutations

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Optimized code based on M2 for faster image captioning training

The code written during my Bachelor Thesis "Classification of Human Whole-Body Motion using Hidden Markov Models".

Fight Recognition from Still Images in the Wild @ WACVW2022, Real-world Surveillance Workshop

6D Grasping Policy for Point Clouds

A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Reference code for the paper CAMS: Color-Aware Multi-Style Transfer.

Transformer Huffman coding - Complete Huffman coding through transformer