State of the art Semantic Sentence Embeddings

Overview

Contrastive Tension

State of the art Semantic Sentence Embeddings

Published Paper · Huggingface Models · Report Bug

Overview

This is the official code accompanied with the paper Semantic Re-Tuning via Contrastive Tension.
The paper was accepted at ICLR-2021 and official reviews and responses can be found at OpenReview.

Contrastive Tension(CT) is a fully self-supervised algorithm for re-tuning already pre-trained transformer Language Models, and achieves State-Of-The-Art(SOTA) sentence embeddings for Semantic Textual Similarity(STS). All that is required is hence a pre-trained model and a modestly large text corpus. The results presented in the paper sampled text data from Wikipedia.

This repository contains:

  • Tensorflow 2 implementation of the CT algorithm
  • State of the art pre-trained STS models
  • Tensorflow 2 inference code
  • PyTorch inference code

Requirements

While it is possible that other versions works equally fine, we have worked with the following:

  • Python = 3.6.9
  • Transformers = 4.1.1

Usage

All the models and tokenizers are available via the Huggingface interface, and can be loaded for both Tensorflow and PyTorch:

import transformers

tokenizer = transformers.AutoTokenizer.from_pretrained('Contrastive-Tension/RoBerta-Large-CT-STSb')

TF_model = transformers.TFAutoModel.from_pretrained('Contrastive-Tension/RoBerta-Large-CT-STSb')
PT_model = transformers.AutoModel.from_pretrained('Contrastive-Tension/RoBerta-Large-CT-STSb')

Inference

To perform inference with the pre-trained models (or other Huggigface models) please see the script ExampleBatchInference.py.
The most important thing to remember when running inference is to apply the attention_masks on the batch output vector before mean pooling, as is done in the example script.

CT Training

To run CT on your own models and text data see ExampleTraining.py for a comprehensive example. This file currently creates a dummy corpus of random text. Simply replace this to whatever corpus you like.

Pre-trained Models

Note that these models are not trained with the exact hyperparameters as those disclosed in the original CT paper. Rather, the parameters are from a short follow-up paper currently under review, which once again pushes the SOTA.

All evaluation is done using the SentEval framework, and shows the: (Pearson / Spearman) correlations

Unsupervised / Zero-Shot

As both the training of BERT, and CT itself is fully self-supervised, the models only tuned with CT require no labeled data whatsoever.
The NLI models however, are first fine-tuned towards a natural language inference task, which requires labeled data.

Model Avg Unsupervised STS STS-b #Parameters
Fully Unsupervised
BERT-Distil-CT 75.12 / 75.04 78.63 / 77.91 66 M
BERT-Base-CT 73.55 / 73.36 75.49 / 73.31 108 M
BERT-Large-CT 77.12 / 76.93 80.75 / 79.82 334 M
Using NLI Data
BERT-Distil-NLI-CT 76.65 / 76.63 79.74 / 81.01 66 M
BERT-Base-NLI-CT 76.05 / 76.28 79.98 / 81.47 108 M
BERT-Large-NLI-CT 77.42 / 77.41 80.92 / 81.66 334 M

Supervised

These models are fine-tuned directly with STS data, using a modified version of the supervised training object proposed by S-BERT.
To our knowledge our RoBerta-Large-STSb is the current SOTA model for STS via sentence embeddings.

Model STS-b #Parameters
BERT-Distil-CT-STSb 84.85 / 85.46 66 M
BERT-Base-CT-STSb 85.31 / 85.76 108 M
BERT-Large-CT-STSb 85.86 / 86.47 334 M
RoBerta-Large-CT-STSb 87.56 / 88.42 334 M

Other Languages

Model Language #Parameters
BERT-Base-Swe-CT-STSb Swedish 108 M

License

Distributed under the MIT License. See LICENSE for more information.

Contact

If you have questions regarding the paper, please consider creating a comment via the official OpenReview submission.
If you have questions regarding the code or otherwise related to this Github page, please open an issue.

For other purposes, feel free to contact me directly at: [email protected]

Acknowledgements

Owner
Fredrik Carlsson
Fredrik Carlsson
Pytorch implementations of the paper Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients

LSF-SAC Pytorch implementations of the paper Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy G

Hanhan 2 Aug 14, 2022
Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Angora Angora is a mutation-based coverage guided fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without s

833 Jan 07, 2023
Official code for NeurIPS 2021 paper "Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN"

Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN Official code for NeurIPS 2021 paper "Towards Scalable Unpaired Virtu

68 Dec 21, 2022
Official PyTorch Implementation of GAN-Supervised Dense Visual Alignment

GAN-Supervised Dense Visual Alignment — Official PyTorch Implementation Paper | Project Page | Video This repo contains training, evaluation and visua

944 Jan 07, 2023
Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Automatic, Readable, Reusable, Extendable Machin is a reinforcement library designed for pytorch. Build status Platform Status Linux Windows Supported

Iffi 348 Dec 24, 2022
University of Rochester 2021 Summer REU focusing on music sentiment transfer using CycleGAN

Music-Sentiment-Transfer University of Rochester 2021 Summer REU focusing on music sentiment transfer using CycleGAN Poster: Music Sentiment Transfer

Miles Sigel 2 Jan 24, 2022
Avatarify Python - Avatars for Zoom, Skype and other video-conferencing apps.

Avatarify Python - Avatars for Zoom, Skype and other video-conferencing apps.

Ali Aliev 15.3k Jan 05, 2023
The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Interscript The Interscript dataset contains interactive user feedback on a T5-11B model generated scripts. Dataset data.json contains the data in an

AI2 8 Dec 01, 2022
Code for the paper "Asymptotics of ℓ2 Regularized Network Embeddings"

README Code for the paper Asymptotics of L2 Regularized Network Embeddings. Requirements Requires Stellargraph 1.2.1, Tensorflow 2.6.0, scikit-learm 0

Andrew Davison 0 Jan 06, 2022
Implementation of paper "Graph Condensation for Graph Neural Networks"

GCond A PyTorch implementation of paper "Graph Condensation for Graph Neural Networks" Code will be released soon. Stay tuned :) Abstract We propose a

Wei Jin 66 Dec 04, 2022
LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

Tixiao Shan 1.1k Dec 27, 2022
Study of human inductive biases in CNNs and Transformers.

Are Convolutional Neural Networks or Transformers more like human vision? This repository contains the code and fine-tuned models of popular Convoluti

Shikhar Tuli 39 Dec 08, 2022
Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows

Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows This is the official implementation of the ICCV 2021 Paper "Probabilistic Mono

62 Nov 23, 2022
FID calculation with proper image resizing and quantization steps

clean-fid: Fixing Inconsistencies in FID Project | Paper The FID calculation involves many steps that can produce inconsistencies in the final metric.

Gaurav Parmar 606 Jan 06, 2023
A tutorial on DataFrames.jl prepared for JuliaCon2021

JuliaCon2021 DataFrames.jl Tutorial This is a tutorial on DataFrames.jl prepared for JuliaCon2021. A video recording of the tutorial is available here

Bogumił Kamiński 106 Jan 09, 2023
PIXIE: Collaborative Regression of Expressive Bodies

PIXIE: Collaborative Regression of Expressive Bodies [Project Page] This is the official Pytorch implementation of PIXIE. PIXIE reconstructs an expres

Yao Feng 331 Jan 04, 2023
Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Refer-it-in-RGBD This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021 Pape

Haolin Liu 34 Nov 07, 2022
SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

SalGAN: Visual Saliency Prediction with Adversarial Networks Junting Pan Cristian Canton Ferrer Kevin McGuinness Noel O'Connor Jordi Torres Elisa Sayr

Image Processing Group - BarcelonaTECH - UPC 347 Nov 22, 2022
2021 Artificial Intelligence Diabetes Datathon

A.I.D.D. 2021 2021 Artificial Intelligence Diabetes Datathon A.I.D.D. 2021은 ‘2021 인공지능 학습용 데이터 구축사업’을 통해 만들어진 학습용 데이터를 활용하여 당뇨병을 효과적으로 예측할 수 있는가에 대한 A

2 Dec 27, 2021
Continuous Diffusion Graph Neural Network

We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE.

Twitter Research 227 Jan 05, 2023