[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

Last update: Dec 07, 2022

Overview

SapBERT: Self-alignment pretraining for BERT

This repo holds code for the SapBERT model presented in our NAACL 2021 paper: Self-Alignment Pretraining for Biomedical Entity Representations [arxiv]; and our ACL 2021 paper: Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking [PDF].

Huggingface Models

[SapBERT]

Standard SapBERT as described in [Liu et al., NAACL 2021]. Trained with UMLS 2020AA (English only), using microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext as the base model. Use [CLS] (before pooler) as the representation of the input.

[SapBERT-XLMR]

Cross-lingual SapBERT as described in [Liu et al., ACL 2021]. Trained with UMLS 2020AB (all languages), using xlm-roberta-base as the base model. Use [CLS] (before pooler) as the representation of the input.

[SapBERT-mean-token]

Same as the standard SapBERT but trained with mean-pooling instead of [CLS] representations.

Environment

The code is tested with python 3.8, torch 1.7.0 and huggingface transformers 4.4.2. Please view requirements.txt for more details.

Train SapBERT

Prepare training data as insrtructed in data/generate_pretraining_data.ipynb.

Run:

cd umls_pretraining
./pretrain.sh 0,1

where 0,1 specifies the GPU devices.

Evaluate SapBERT

Please view evaluation/README.md for details.

Citations

@article{liu2021self,
	title={Self-Alignment Pretraining for Biomedical Entity Representations},
	author={Liu, Fangyu and Shareghi, Ehsan and Meng, Zaiqiao and Basaldella, Marco and Collier, Nigel},
	journal={arXiv preprint arXiv:2010.11784},
	year={2020}
}

Acknowledgement

Parts of the code are modified from BioSyn. We appreciate the authors for making BioSyn open-sourced.

License

SapBERT is MIT licensed. See the LICENSE file for details.

[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

Related tags

Overview

SapBERT: Self-alignment pretraining for BERT

Huggingface Models

[SapBERT]

[SapBERT-XLMR]

[SapBERT-mean-token]

Environment

Train SapBERT

Evaluate SapBERT

Citations

Acknowledgement

License

Owner

Cambridge Language Technology Lab

Commonsense Ability Tests

Keras-1D-NN-Classifier

Fibonacci Method Gradient Descent

🔊 Audio and fastai v2

Spherical CNNs

A system used to detect whether a person is wearing a medical mask or not.

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

Deep universal probabilistic programming with Python and PyTorch

a morph transfer UGATIT for image translation.

This Deep Learning Model Predicts that from which disease you are suffering.

(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"

Pytorch Implementation of the paper "Cross-domain Correspondence Learning for Exemplar-based Image Translation"

Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency

Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

An experiment to bait a generalized frontrunning MEV bot

A PyTorch Implementation of the Luna: Linear Unified Nested Attention

EfficientNetv2 TensorRT int8

SSD-based Object Detection in PyTorch

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)