SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

Last update: Jan 02, 2023

Related tags

Overview

SNCSE

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

This is the repository for SNCSE.

SNCSE aims to alleviate feature suppression in contrastive learning for unsupervised sentence embedding. In the field, feature suppression means the models fail to distinguish and decouple textual similarity and semantic similarity. As a result, they may overestimate the semantic similarity of any pairs with similar textual regardless of the actual semantic difference between them. And the models may underestimate the semantic similarity of pairs with less words in common. (Please refer to Section 5 of our paper for several instances and detailed analysis.) To this end, we propose to take the negation of original sentences as soft negative samples, and introduce them into the traditional contrastive learning framework through bidirectional margin loss (BML). The structure of SNCSE is as follows:

The performance of SNCSE on STS task with different encoders is:

To reproduce above results, please download the files and unzip it to replace the original file folder. Then download the models, modify the file path variables and run:

python bert_prediction.py
python roberta_prediction.py

To train SNCSE, please download the training file, and put it at /SNCSE/data. You can either run:

python generate_soft_negative_samples.py

to generate soft negative samples, or use our files in /Files/soft_negative_samples.txt. Then you may modify and run train_SNCSE.sh.

To evaluate the checkpoints saved during training on the development set of STSB task, please run:

python bert_evaluation.py
python roberta_evaluation.py

Feel free to contact the authors at [email protected] for any questions.

Please cite SNCSE as

{

Hao Wang, Yangguang Li, Zhen Huang, Yong Dou, Lingpeng Kong, Jing Shao.

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples.

CoRR, abs/2201.05979, 2022.

}

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

Related tags

Overview

SNCSE

Owner

Sense-GVT

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Python package for Turkish Language.

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

COVID-19 Chatbot with Rasa 2.0: open source conversational AI

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

StarGAN - Official PyTorch Implementation

Binary LSTM model for text classification

Script to download some free japanese lessons in portuguse from NHK

Korea Spell Checker

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

Question and answer retrieval in Turkish with BERT

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.

A simple implementation of N-gram language model.

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

🌐 Translation microservice powered by AI

chaii - hindi & tamil question answering

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

DeBERTa: Decoding-enhanced BERT with Disentangled Attention