[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Last update: Nov 24, 2022

Overview

LM-Critic: Language Models for Unsupervised Grammatical Error Correction

This repo provides the source code & data of our paper: LM-Critic: Language Models for Unsupervised Grammatical Error Correction (EMNLP 2021).

@InProceedings{yasunaga2021language,
  author =  {Michihiro Yasunaga and Jure Leskovec and Percy Liang},
  title =   {LM-Critic: Language Models for Unsupervised Grammatical Error Correction},
  year =    {2021},  
  booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},  
}

Overview

We developed a new method to use a pretrained language model (e.g. GPT2) to predict if a sentence is grammatical, which we call LM-Critic. You can play with this LM-Critic as described in Section 1. below. The idea is to deem a sentence to be grammatical if the language model assigns it a higher probability than candidates in its local neighborhood.

We then use the LM-Critic to generate training data for grammatical error correction (GEC) from unlabeled raw text, using the BIFI algorithm. This allows us to train GEC models in an unsupervised way. See Section 2. below.

How LM-Critic works

LM-Critic for GEC: We use LM-Critic to learn GEC models

0. Dependencies

Run the following commands to create a conda environment (assuming CUDA10.1):

conda create -n lm-critic python=3.8
conda activate lm-critic
pip install torch==1.6.0 torchvision==0.7.0
pip install transformers==4.3.3 datasets==1.3.0 absl-py rouge-score
pip install nltk wandb editdistance spacy==3.0.5
python3 -m nltk.downloader punkt

To use the ERRANT scorer for GEC evaluation, create another conda environment separately, as follows:

conda create -n errant200 python=3.6
conda activate errant200
pip3 install errant==2.0.0
python3 -m spacy download en

1. Use LM-Critic

The LM-Critic is defined in critic/critic.py. To play with it, you can run:

CUDA_VISIBLE_DEVICES=0 python3 critic/critic.py

This will prompt you for a sentence input, and returns the judgment (Good: grammatical, Bad: ungrammatical) along with the probability score of the input sentence. For example,

Enter a sentence: I like apple.
Bad! Your sentence log(p) = -22.333
Neighbor sentence with highest log(p): I like apples. (= -19.570)

Enter a sentence: I like apples.
Good! Your sentence log(p) = -19.570

To run intrinsic evaluation of LM-Critic on a test suite, run:

CUDA_VISIBLE_DEVICES=0 python3 eval_critic/eval_critic.py

You can import the LM-Critic function (from critic.critic import gpt2_critic) for your own code as done in this script.

2. Train/run grammatical error correction models

Change the working directory to gec/. First, download all the data (GEC benchmarks and training data) by running ./download_data.sh.

Round 0

Here we train an initial fixer on synthetic GEC data. Run the commands in src/run-round0.sh.

This corresponds to the "Transformer" baseline in the paper Table 4.
The original synthetic data was dowloaded from here, and our processed data is available at data/round0__synthetic/synthetic_paired_data_9M.json

Round 1

Here we use the BIFI algorithm and unlabeled text data to train an improved fixer. Run the commands in src/run-round1.sh.

Specifically, we perform the following four steps: (a) apply the current fixer (from Round 0) to unlabeled sentences and keep outputs that LM-Critic judges as good; (b) train a breaker on the paired data generated in Step (a); (c) apply the trained breaker on unlabeled sentences and keep outputs that LM-Critic judges as bad; (d) train the fixer on the paired data generated so far (Step (a) + Step (c) + synthetic data from Round0).
This corresponds to the "+ BIFI" in the paper Table 4.
The original unlabeled text data was downloaded from Yahoo! Answer dataset and Wikipedia revision dataset (we take sentences pre revision). Our processed paired data used in Step (d) is available at data/round1__BIFI/BIFI_paired_data_9M.json

For evaluation, we use ERRANT and M^2Scorer. ERRANT is set up in the conda environment described above (errant200) and M^2Scorer is set up in the download script.

[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Related tags

Overview

LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Overview

0. Dependencies

1. Use LM-Critic

2. Train/run grammatical error correction models

Round 0

Round 1

Owner

Michihiro Yasunaga

Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

Biterm Topic Model (BTM): modeling topics in short texts

VD-BERT: A Unified Vision and Dialog Transformer with BERT

NLP Text Classification

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

Code for the project carried out fulfilling the course requirements for Fall 2021 NLP at NYU

Exploration of BERT-based models on twitter sentiment classifications

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

A list of NLP(Natural Language Processing) tutorials

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Translation to python of Chris Sims' optimization function

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

2021海华AI挑战赛·中文阅读理解·技术组·第三名

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Summarization module based on KoBART

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

A 10000+ hours dataset for Chinese speech recognition