Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Overview

Sequence-to-Sequence Learning with Latent Neural Grammars

Code for the paper:
Sequence-to-Sequence Learning with Latent Neural Grammars
Yoon Kim
arXiv Preprint

Dependencies

The code was tested in python 3.7 and pytorch 1.5. We also use a slightly modified version of the Torch-Struct library, which is included in the repo and can be installed via:

cd pytorch-struct
python setup.py install

Data

For convenience we include the datasets used in the paper in the data/ folder. Please cite the original papers when using the data (i.e. Lake and Baroni 2018 for SCAN/MT, and Lyu et al. 2021 for StylePTB).

Training

SCAN

To train the model on (for example) the length split:

python train_scan.py --train_file data/SCAN/tasks_train_length.txt --save_path scan-length.pt

For prediction and evaluation:

python predict_scan.py --data_file data/SCAN/tasks_test_length.txt --model_path scan-length.pt

Style Transfer

To train on (for example) the active-to-passive task:

python train_styleptb.py --train_file data/StylePTB/ATP/train.tsv --dev_file data/StylePTB/ATP/valid.tsv --save_path styleptb-atp.pt

To predict:

python predict_styleptb.py --data_file data/StylePTB/ATP/test.tsv --model_path styleptb-atp.pt 
--out_file styleptb-atp-pred.txt

We use the nlg-eval package to calculate the various metrics.

Machine Translation

To train on MT:

python train_mt.py --train_file_src data/MT/train.en --train_file_tgt data/MT/train.fr 
--dev_file_src data/MT/dev.en --dev_file_tgt data/MT/dev.fr --save_path mt.pt

To predict on the daxy test set:

python predict_mt.py --data_file data/MT/test-daxy.en --model_path mt.pt --out_file mt-pred-daxy.txt

For the regular test set:

python predict_mt.py --data_file data/MT/test.en --model_path mt.pt --out_file mt-pred.txt

We use the multi-bleu script to calculate BLEU.

Training Stability

We observed training to be unstable and the approach required several runs across different seeds to perform well. For reference we have posted logs of some example runs in the logs/ folder.

License

MIT

Owner
Yoon Kim
Yoon Kim
LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language ⚖️ The library of Natural Language Processing for Brazilian legal lang

Felipe Maia Polo 125 Dec 20, 2022
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

Abel 211 Dec 28, 2022
中文无监督SimCSE Pytorch实现

A PyTorch implementation of unsupervised SimCSE SimCSE: Simple Contrastive Learning of Sentence Embeddings 1. 用法 无监督训练 python train_unsup.py ./data/ne

99 Dec 23, 2022
TTS is a library for advanced Text-to-Speech generation.

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pretra

Mozilla 6.5k Jan 08, 2023
A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

We provide a notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository. The notebook also shows how to segment the corpus using BPE tokenizatio

Computation for Indian Language Technology (CFILT) 9 Oct 13, 2022
A script that automatically creates a branch name using google translation api and jira api

About google translation api와 jira api을 사용하여 자동으로 브랜치 이름을 만들어주는 스크립트 Setup 환경변수에 다음 3가지를 등록해야 한다. JIRA_USER : JIRA email (ex: hyunwook.kim 2 Dec 20, 2021

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

Tsukinousag1 3 Apr 02, 2022
Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework

Graph2Pix: A Graph-Based Image to Image Translation Framework Installation Install the dependencies in env.yml $ conda env create -f env.yml $ conda a

18 Nov 17, 2022
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

Microsoft 105 Jan 08, 2022
auto_code_complete is a auto word-completetion program which allows you to customize it on your need

auto_code_complete v1.3 purpose and usage auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the m

RUO 2 Feb 22, 2022
Perform sentiment analysis and keyword extraction on Craigslist listings

craiglist-helper synopsis Perform sentiment analysis and keyword extraction on Craigslist listings Background I love Craigslist. I've found most of my

Mark Musil 1 Nov 08, 2021
Kurumi ChatBot

KurumiChatBot Just another Telegram AI chat bot written in Python using Pyrogram. A public running instance can be found on telegram as @TokisakiChatB

Yoga Pranata 3 Jun 28, 2022
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

NL-Augmenter 🦎 → 🐍 The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformat

684 Jan 09, 2023
Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine

Semantic search through Wikipedia with the Weaviate vector search engine Weaviate is an open source vector search engine with build-in vectorization a

SeMI Technologies 191 Dec 26, 2022
Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any

Little Endian 1 Apr 28, 2022
PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

LXMERT: Learning Cross-Modality Encoder Representations from Transformers Our servers break again :(. I have updated the links so that they should wor

Hao Tan 838 Dec 19, 2022
Programme de chiffrement et de déchiffrement inverse d'un message en python3.

Chiffrement Inverse En Python3 Programme de chiffrement et de déchiffrement inverse d'un message en python3. Explication du chiffrement inverse avec c

Malik Makkes 2 Mar 26, 2022
Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

TGCLOUD 🪁 Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁 Features Easy to Deploy Heroku Supp

Mr.Acid dev 6 Oct 18, 2022
Tracking Progress in Natural Language Processing

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Sebastian Ruder 21.2k Dec 30, 2022
Python3 to Crystal Translation using Python AST Walker

py2cr.py A code translator using AST from Python to Crystal. This is basically a NodeVisitor with Crystal output. See AST documentation (https://docs.

66 Jul 25, 2022