Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Last update: Dec 25, 2022

Related tags

Overview

ConSERT

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Requirements

torch==1.6.0
cudatoolkit==10.0.103
cudnn==7.6.5
sentence-transformers==0.3.9
transformers==3.4.0
tensorboardX==2.1
pandas==1.1.5
sentencepiece==0.1.85
matplotlib==3.4.1
apex==0.1.0

Get Started

Download pre-trained language model (e.g. bert-base-uncased) from HuggingFace's Library
Download STS datasets to ./data folder using SentEval toolkit

Run the following script to run the unsupervised experiment:

python3 main.py --no_pair --seed 1 --use_apex_amp --apex_amp_opt_level O1 --batch_size 96 --max_seq_length 64 --evaluation_steps 200 --add_cl --cl_loss_only --cl_rate 0.15 --temperature 0.1 --learning_rate 0.0000005 --train_data stssick --num_epochs 10 --da_final_1 feature_cutoff --da_final_2 shuffle --cutoff_rate_final_1 0.2 --model_name_or_path [PRETRAINED_BERT_FOLDER] --model_save_path ./output/unsup-base-feature_cutoff-shuffle --force_del --no_dropout --patience 10

where [PRETRAINED_BERT_FOLDER] should be replaced to the folder that contains downloaded pre-trained language model

Citation

@article{yan2021consert,
  title={ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer},
  author={Yan, Yuanmeng and Li, Rumei and Wang, Sirui and Zhang, Fuzheng and Wu, Wei and Xu, Weiran},
  journal={arXiv preprint arXiv:2105.11741},
  year={2021}
}

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Related tags

Overview

ConSERT

Requirements

Get Started

Citation

Owner

Yan Yuanmeng

"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

Sapiens is a human antibody language model based on BERT.

Unsupervised text tokenizer focused on computational efficiency

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Wrapper to display a script output or a text file content on the desktop in sway or other wlroots-based compositors

Package for controllable summarization

A very simple framework for state-of-the-art Natural Language Processing (NLP)

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

A fast and easy implementation of Transformer with PyTorch.

ADCS - Automatic Defect Classification System (ADCS) for SSMC

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Experiments in converting wikidata to ftm

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Treemap visualisation of Maya scene files

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

Wind Speed Prediction using LSTMs in PyTorch

Python module (C extension and plain python) implementing Aho-Corasick algorithm