Code to train models from "Paraphrastic Representations at Scale".

Overview

Paraphrastic Representations at Scale

Code to train models from "Paraphrastic Representations at Scale".

The code is written in Python 3.7 and requires H5py, jieba, numpy, scipy, sentencepiece, sacremoses, and PyTorch >= 1.0 libraries. These can be insalled with the following command:

pip install -r requirements.txt

To get started, download the data files used for training from http://www.cs.cmu.edu/~jwieting and download the STS evaluation data:

wget http://phontron.com/data/paraphrase-at-scale.zip
unzip paraphrase-at-scale.zip
rm paraphrase-at-scale.zip
wget http://www.cs.cmu.edu/~jwieting/STS.zip .
unzip STS.zip
rm STS.zip

If you use our code, models, or data for your work please cite:

@article{wieting2021paraphrastic,
    title={Paraphrastic Representations at Scale},
    author={Wieting, John and Gimpel, Kevin and Neubig, Graham and Berg-Kirkpatrick, Taylor},
    journal={arXiv preprint arXiv:2104.15114},
    year={2021}
}

@inproceedings{wieting19simple,
    title={Simple and Effective Paraphrastic Similarity from Parallel Translations},
    author={Wieting, John and Gimpel, Kevin and Neubig, Graham and Berg-Kirkpatrick, Taylor},
    booktitle={Proceedings of the Association for Computational Linguistics},
    url={https://arxiv.org/abs/1909.13872},
    year={2019}
}

To embed a list of sentences:

python -u embed_sentences.py --sentence-file paraphrase-at-scale/example-sentences.txt --load-file paraphrase-at-scale/model.para.lc.100.pt  --sp-model paraphrase-at-scale/paranmt.model --output-file sentence_embeds.np --gpu 0

To score a list of sentence pairs:

python -u score_sentence_pairs.py --sentence-pair-file paraphrase-at-scale/example-sentences-pairs.txt --load-file paraphrase-at-scale/model.para.lc.100.pt  --sp-model paraphrase-at-scale/paranmt.model --gpu 0

To train a model (for example, on ParaNMT):

python -u main.py --outfile model.para.out --lower-case 1 --tokenize 0 --data-file paraphrase-at-scale/paranmt.sim-low=0.4-sim-high=1.0-ovl=0.7.final.h5 \
       --model avg --dim 1024 --epochs 25 --dropout 0.0 --sp-model paraphrase-at-scale/paranmt.model --megabatch-size 100 --save-every-epoch 1 --gpu 0 --vocab-file paraphrase-at-scale/paranmt.sim-low=0.4-sim-high=1.0-ovl=0.7.final.vocab

To download and preprocess raw data for training models (both bilingual and ParaNMT), see preprocess/bilingual and preprocess/paranmt.

Owner
John Wieting
John Wieting
BuildingNet: Learning to Label 3D Buildings

BuildingNet This is the implementation of the BuildingNet architecture described in this paper: Paper: BuildingNet: Learning to Label 3D Buildings Arx

16 Nov 07, 2022
Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection PyTorch code release of the paper "Attentive Prototypes for Sour

Deepti Hegde 23 Oct 17, 2022
Demonstrates iterative FGSM on Apple's NeuralHash model.

apple-neuralhash-attack Demonstrates iterative FGSM on Apple's NeuralHash model. TL;DR: It is possible to apply noise to CSAM images and make them loo

Lim Swee Kiat 11 Jun 23, 2022
ROMP: Monocular, One-stage, Regression of Multiple 3D People, ICCV21

Monocular, One-stage, Regression of Multiple 3D People ROMP, accepted by ICCV 2021, is a concise one-stage network for multi-person 3D mesh recovery f

Yu Sun 937 Jan 04, 2023
A Pytorch Implementation of a continuously rate adjustable learned image compression framework.

GainedVAE A Pytorch Implementation of a continuously rate adjustable learned image compression framework, Gained Variational Autoencoder(GainedVAE). N

39 Dec 24, 2022
Deep-learning X-Ray Micro-CT image enhancement, pore-network modelling and continuum modelling

EDSR modelling A Github repository for deep-learning image enhancement, pore-network and continuum modelling from X-Ray Micro-CT images. The repositor

Samuel Jackson 7 Nov 03, 2022
Art Project "Schrödinger's Game of Life"

Repo of the project "Team Creative Quantum AI: Schrödinger's Game of Life" Installation new conda env: conda create --name qcml python=3.8 conda activ

ℍ◮ℕℕ◭ℍ ℝ∈ᛔ∈ℝ 2 Sep 15, 2022
Using BERT+Bi-LSTM+CRF

Chinese Medical Entity Recognition Based on BERT+Bi-LSTM+CRF Step 1 I share the dataset on my google drive, please download the whole 'CCKS_2019_Task1

Xiang WU 55 Dec 21, 2022
PyTorch implementation of TSception V2 using DEAP dataset

TSception This is the PyTorch implementation of TSception V2 using DEAP dataset in our paper: Yi Ding, Neethu Robinson, Su Zhang, Qiuhao Zeng, Cuntai

Yi Ding 27 Dec 15, 2022
This is a classifier which basically predicts whether there is a gun law in a state or not, depending on various things like murder rates etc.

Gun-Laws-Classifier This is a classifier which basically predicts whether there is a gun law in a state or not, depending on various things like murde

Awais Saleem 1 Jan 20, 2022
一个运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板的签到项目

定时面板上的签到盒 一个运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板的签到项目 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 特别声明 本仓库发布的脚本及其中涉及的任何解锁和解密分析脚本,仅用于测试和学习研究,禁止用于商业用途,不能保证其合

Leon 1.1k Dec 30, 2022
An executor that loads ONNX models and embeds documents using the ONNX runtime.

ONNXEncoder An executor that loads ONNX models and embeds documents using the ONNX runtime. Usage via Docker image (recommended) from jina import Flow

Jina AI 2 Mar 15, 2022
Bayesian regularization for functional graphical models.

BayesFGM Paper: Jiajing Niu, Andrew Brown. Bayesian regularization for functional graphical models. Requirements R version 3.6.3 and up Python 3.6 and

0 Oct 07, 2021
TransCD: Scene Change Detection via Transformer-based Architecture

TransCD: Scene Change Detection via Transformer-based Architecture

wangzhixue 29 Dec 11, 2022
NaturalProofs: Mathematical Theorem Proving in Natural Language

NaturalProofs: Mathematical Theorem Proving in Natural Language NaturalProofs: Mathematical Theorem Proving in Natural Language Sean Welleck, Jiacheng

Sean Welleck 83 Jan 05, 2023
Greedy Gaussian Segmentation

GGS Greedy Gaussian Segmentation (GGS) is a Python solver for efficiently segmenting multivariate time series data. For implementation details, please

Stanford University Convex Optimization Group 72 Dec 07, 2022
Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices

Intro Real-time object detection and classification. Paper: version 1, version 2. Read more about YOLO (in darknet) and download weight files here. In

Trieu 6.1k Dec 30, 2022
Official Pytorch implementation of the paper: "Locally Shifted Attention With Early Global Integration"

Locally-Shifted-Attention-With-Early-Global-Integration Pretrained models You can download all the models from here. Training Imagenet python -m torch

Shelly Sheynin 14 Apr 15, 2022
ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

ALFRED A Benchmark for Interpreting Grounded Instructions for Everyday Tasks Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han,

ALFRED 204 Dec 15, 2022
Deconfounding Temporal Autoencoder: Estimating Treatment Effects over Time Using Noisy Proxies

Deconfounding Temporal Autoencoder (DTA) This is a repository for the paper "Deconfounding Temporal Autoencoder: Estimating Treatment Effects over Tim

Milan Kuzmanovic 3 Feb 04, 2022