Tracking Progress in Natural Language Processing

Last update: Dec 30, 2022

Overview

Tracking Progress in Natural Language Processing

English

Vietnamese

Hindi

Chinese

For more tasks, datasets and results in Chinese, check out the Chinese NLP website.

French

Russian

Spanish

Persian

Turkish

Summarization

German

Summarization

This document aims to track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.

It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech tagging as well as more recent ones such as reading comprehension and natural language inference. The main objective is to provide the reader with a quick overview of benchmark datasets and the state-of-the-art for their task of interest, which serves as a stepping stone for further research. To this end, if there is a place where results for a task are already published and regularly maintained, such as a public leaderboard, the reader will be pointed there.

If you want to find this document again in the future, just go to nlpprogress.com or nlpsota.com in your browser.

Contributing

Guidelines

Results Results reported in published papers are preferred; an exception may be made for influential preprints.

Datasets Datasets should have been used for evaluation in at least one published paper besides the one that introduced the dataset.

Code We recommend to add a link to an implementation if available. You can add a Code column (see below) to the table if it does not exist. In the Code column, indicate an official implementation with Official. If an unofficial implementation is available, use Link (see below). If no implementation is available, you can leave the cell empty.

Adding a new result

If you would like to add a new result, you can just click on the small edit button in the top-right corner of the file for the respective task (see below).

This allows you to edit the file in Markdown. Simply add a row to the corresponding table in the same format. Make sure that the table stays sorted (with the best result on top). After you've made your change, make sure that the table still looks ok by clicking on the "Preview changes" tab at the top of the page. If everything looks good, go to the bottom of the page, where you see the below form.

Add a name for your proposed change, an optional description, indicate that you would like to "Create a new branch for this commit and start a pull request", and click on "Propose file change".

Adding a new dataset or task

For adding a new dataset or task, you can also follow the steps above. Alternatively, you can fork the repository. In both cases, follow the steps below:

If your task is completely new, create a new file and link to it in the table of contents above.
If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order).
Briefly describe the dataset/task and include relevant references.
Describe the evaluation setting and evaluation metric.
Show how an annotated example of the dataset/task looks like.
Add a download link if available.
Copy the below table and fill in at least two results (including the state-of-the-art) for your dataset/task (change Score to the metric of your dataset). If your dataset/task has multiple metrics, add them to the right of Score.
Submit your change as a pull request.

Model	Score	Paper / Source	Code

Wish list

These are tasks and datasets that are still missing:

Bilingual dictionary induction
Discourse parsing
Keyphrase extraction
Knowledge base population (KBP)
More dialogue tasks
Semi-supervised learning
Frame-semantic parsing (FrameNet full-sentence analysis)

Exporting into a structured format

You can extract all the data into a structured, machine-readable JSON format with parsed tasks, descriptions and SOTA tables.

The instructions are in structured/README.md.

Instructions for building the site locally

Instructions for building the website locally using Jekyll can be found here.

Comments

Conll-2003 uncomparable results

Because of the small size the training set of Conll-2003, some authors incorporated the development set as a part of training data after tuning the hyper-parameters. Consequently, not all results are directly comparable.

Train+dev:

Flair embeddings (Akbik et al., 2018) Peters et al. (2017) Yang et al. (2017)

Maybe those results should be marked by an asterisk

opened by ghaddarAbs 28
NLP Progress Graph

Hi Sebastian, loved your idea for this repo. I was thinking if we can have a graph, something like this

showing progress of different tasks in NLP based on the updates to their markdown file. I have created a shell script which clones your repo into my local, counts the no of commit for different files and using python/pandas preprocess the result and create a bar chart out of it and uploads it to a free image uploading service.

Currently, it shows count of all the commit for a specific file but if we can have a guideline for adding new results, fixing errors .. Maybe different identifiers

Then we can count the no of times, a new result has been added to an NLP task. This can help in visualizing the NLP areas of most active/Improving research.

Currently, the graph doesn't make much sense but over the time it will improve as we update with more results.

Also, If you think something like this can benefit the community, i can create a cron job on my pc(i don't have a server) which will update the image url with the latest graph which you can show on the main page.

opened by nirmalsinghania2008 16
YAML - pros and cons
I'd like to discuss here the pros and cons of using YAML going forward or whether we should stick with Markdown tables. Here are some pros and cons, mainly from @NirantK (in https://github.com/sebastianruder/NLP-progress/pull/116), @stared (in https://github.com/sebastianruder/NLP-progress/issues/43, https://github.com/sebastianruder/NLP-progress/pull/64) and myself.

Pros:

Easier trend spotting in performance improvements

Easy to create plots and visualizations going forward

Data is separated from presentation

Cons:

Hard for contributors, e.g. HTML omissions can't be spotted without setting up Jekyll locally

Github Repo becomes useless for readers, relying exclusively on nlpprogress.com

Many visualizations (e.g. bar charts) based on performance numbers are not more useful than the raw tables

Other opinions are welcome.
opened by sebastianruder 10
What about other languages?

Thanks for this work!

These pages seem to cover the progress only for English (well, except MT). Do you have plans to include other languages?

One extreme example is POS tagging and dependency parsing. UD has 60+ languages :) For others, there should be very limited data

opened by Hrant-Khachatrian 10
Incorrect BLEU score for English-Hindi MT System

The BLEU score written in the Document is 89.35 which looks wrong to me. The referred paper mentions a BLEU score of 12.83 which itself is not state-of-the-art for the language pair.

opened by kartikeypant 7
add G2P conversion task of schwa deletion to Hindi

There's been a good body of previous work on schwa deletion in NLP/CL, you can see some of it in our paper. It'll be good to keep track of the SOTA on it since it's an important task for G2P conversion in North Indian languages.

opened by aryamanarora 6
Added new task: data-to-text generation

I have added a new task of Data-to-Text Natural Language Generation (D2T NLG). D2T NLG differs from other NLG tasks such as MT or QA in a way that the input to text generation system is a structured representation (table, knowledge graph, or JSON) instead of unstructured text. This document provides an overview of three most recent and popular datasets available publicly for D2T NLG. With the advancements in deep learning - several novel neural methods are being proposed that are capable of generating accurate, fluent and diverse texts.

opened by ashishu007 6
Explain relation to paperswithcode.com

Since the inception of this great repository of state-of-the-art results, alternatives such as paperswithcode.com have gained traction. This raises the question of the usefulness of keeping both resources up to date with the latest results. Could users and maintainers of this repository perhaps elaborate a bit, here and/or the README, how they see this resource relating to paperswithcode.com and particularly what nlpprogress.com does well that the former does not?

opened by cwenner 6
add TCAN results to LM

To be honest, I'm a bit skeptical about their results and asked them some questions via email. So let's put a hold on this pull request for now (unless the maintainers think it's fine) and I will update it when they answered my questions.

opened by Separius 6
Add missing LM SOTA result + # params + prev SOTA

Add missing LM ensemble which is SOTA for PTB. Add second-in-line LM SOTA for strict interpretation. Add number of params for LM results.

(unsure why it lists commits that have already been merged)

opened by cwenner 6
Data in YAML for structure and plots
Related to #43.

Right now did some demo for CCG. I didn't work on the plot form, just wanted to show it is possible and easy. Also - I think that data form can be standarized - so it would be simpler to add more complicated things (e.g. further comments, links to multiple implementations, etc).

See files in:

_data - data in YAML format

_includes - for ways of converting data into its presentations (tables, charts, etc)

ccg_supertagging.md to see how to include these

IMHO YAML is cleaner for writing and reading than markdown tables, so it is an advantage on its own. From my experience contributors (ones who use GitHub) have no slightest problem in using YAML (vide https://p.migdal.pl/interactive-machine-learning-list/).

Right now I generate data through Liquid template.

D3.js Visualizations Using YAML and Jekyll - generating JSON via Liquid (but it is kind of ugly)

loading YAML with js-yaml and then using D3.js (or Vue.js, or other library)
opened by stared 6
Pull request with new emotion detection dataset

There seems to be some conflicts, therefore I am not resolving it as it might remove some code. So could you be kind to resolve them and merge my request?

opened by KhondokerIslam 0
Update paraphrase-generation.md

MULTIPIT, MULTIPITCROWD and MULTIPITEXPERT

Past efforts on creating paraphrase corpora only consider one paraphrase criteria without taking into account the fact that the desired “strictness” of semantic equivalence in paraphrases varies from task to task (Bhagat and Hovy, 2013; Liu and Soh, 2022). For example, for the purpose of tracking unfolding events, “A tsunami hit Haiti.” and “303 people died because of the tsunami in Haiti” are sufficiently close to be considered as paraphrases; whereas for paraphrase generation, the extra information “303 people dead” in the latter sentence may lead models to learn to hallucinate and generate more unfaithful content. In this paper, the authors present an effective data collection and annotation method to address these issues.

MULTIPIT is a topic Paraphrase in Twitter corpus that consists of a total of 130k sentence pairs with crowdsoursing (MULTIPITCROWD ) and expert (MULTIPITEXPERT ) annotations. MULTIPITCROWD is a large crowdsourced set of 125K sentence pairs that is useful for tracking information onTwitter. | Model | F1 | Paper / Source | Code | | ------------- | :-----:| --- | --- | | DeBERTaV3large | 92.00 |Improving Large-scale Paraphrase Acquisition and Generation| Unavailable|

MULTIPITEXPERT is an expert annotated set of 5.5K sentence pairs using a stricter definition that is more suitable for acquiring paraphrases for generation purpose. | Model | F1 | Paper / Source | Code | | ------------- | :-----:| --- | --- | | DeBERTaV3large | 83.20 |Improving Large-scale Paraphrase Acquisition and Generation| Unavailable|

opened by adrienpayong 0
add this to machine translation,. Is it okay?

WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages

| Model | BLEU | Paper / Source | | ------------- | :-----:| --- | | vanilla MNMT models| 17.95 | Tencent’s Multilingual Machine Translation System for WMT22 Large-Scale African Languages|

opened by adrienpayong 0

Releases(v0.3)

v0.3(Mar 26, 2020)
Updates during the last month include:

New results on English-Hindi machine translation

New results on intent detection (with code)

State-of-the-art results on CNN / Daily Mail summarization

State-of-the-art results on coreference resolution

State-of-the-art language modelling results on WikiText-103

A new corpus for query-based abstractive snippet generation

Source code(tar.gz)
Source code(zip)
v0.2(Feb 24, 2020)
Updates during the last month include:

Code for the Mogrifier LSTM (ICLR 2020), SOTA on PTB and WikiText-2 is now online

New SOTA models for text simplification

New dataset on Gendered Ambiguous Pronoun (GAP) resolution

Results from the Dialogue System Technology Challenge 8, SOTA on the Ubuntu IRC data

New reading comprehension datasets in French, Russian, Chinese, and Korean

Source code(tar.gz)
Source code(zip)
v0.1(Jan 18, 2020)
This is the first monthly release of NLP Progress.

Updates during the last month include:

A new state of the art summarisation datasets

New competitive LM results

A new state of the art on coreference resolution

A new semantic parsing task, UCCA parsing

A new state of the art on AMR parsing

A new task, spoken language understanding

XSum, a new dataset for summarisation

Source code(tar.gz)
Source code(zip)

Owner

Sebastian Ruder

Research Scientist @DeepMind

GitHub Repository https://nlpprogress.com/

Unsupervised intent recognition

INTENT author: steeve LAQUITAINE description: deployment pattern: currently batch only Setup & run git clone https://github.com/slq0/intent.git bash

1 Apr 08, 2022

Uncomplete archive of files from the European Nopsled Team

European Nopsled CTF Archive This is an archive of collected material from various Capture the Flag competitions that the European Nopsled team played

4 Nov 24, 2021

Every Google, Azure & IBM text to speech voice for free

TTS-Grabber Quick thing i made about a year ago to download any text with any tts voice, over 630 voices to choose from currently. It will split the i

16 Dec 07, 2022

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

IMDB Sentiment Analysis This is the final project of Machine Learning Courses in Huazhong University of Science and Technology, School of Artificial I

0 Dec 27, 2021

LUKE -- Language Understanding with Knowledge-based Embeddings

LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transf

587 Dec 30, 2022

A Flask Sentiment Analysis API, with visual implementation

The Sentiment Analysis Api was created using python flask module,it allows users to parse a text or sentence throught the (?text) arguement, then view the sentiment analysis of that sentence. It can

10 Jul 17, 2022

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Towards Abstractive Grounded Summarization of Podcast Transcripts We provide the source code for the paper "Towards Abstractive Grounded Summarization

10 Jul 01, 2022

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

42 Dec 13, 2022

German Text-To-Speech Engine using Tacotron and Griffin-Lim

jotts JoTTS is a German text-to-speech engine using tacotron and griffin-lim. The synthesizer model has been trained on my voice using Tacotron1. Due

6 Aug 28, 2022

State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

3k Jan 05, 2023

Finally, some decent sample sentences

tts-dataset-prompts This repository aims to be a decent set of sentences for people looking to clone their own voices (e.g. using Tacotron 2). Each se

19 Dec 13, 2022

PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation

SITT The repo contains official PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation. Authors: Boyi Li Yin Cui T

52 Jan 05, 2023

Blue Brain text mining toolbox for semantic search and structured information extraction

Blue Brain Search Source Code DOI Data & Models DOI Documentation Latest Release Python Versions License Build Status Static Typing Code Style Securit

29 Dec 01, 2022

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

41 Jan 03, 2023

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

67 Nov 14, 2022

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

1 Jan 03, 2022

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

SNCSE SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples This is the repository for SNCSE. SNCSE aims to allev

59 Jan 02, 2023

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Main Idea The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank Semantic Search Re

2 Jan 28, 2022

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

CTC Decoding Algorithms Update 2021: installable Python package Python implementation of some common Connectionist Temporal Classification (CTC) decod

736 Jan 03, 2023

Multi Task Vision and Language

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-

711 Jan 08, 2023

Tracking Progress in Natural Language Processing

Related tags

Overview

Tracking Progress in Natural Language Processing

Table of contents

English

Vietnamese

Hindi

Chinese

French

Russian

Spanish

Portuguese

Korean

Nepali

Bengali

Persian

Turkish

German

Contributing

Guidelines

Adding a new result

Adding a new dataset or task

Wish list

Exporting into a structured format

Instructions for building the site locally

Comments

Train+dev:

MULTIPIT, MULTIPITCROWD and MULTIPITEXPERT

WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages

Releases(v0.3)

v0.3(Mar 26, 2020)

v0.2(Feb 24, 2020)

v0.1(Jan 18, 2020)

Owner

Sebastian Ruder

Unsupervised intent recognition

Uncomplete archive of files from the European Nopsled Team

Every Google, Azure & IBM text to speech voice for free

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

LUKE -- Language Understanding with Knowledge-based Embeddings

A Flask Sentiment Analysis API, with visual implementation

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

German Text-To-Speech Engine using Tacotron and Griffin-Lim

State of the Art Natural Language Processing

Finally, some decent sample sentences

PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation

Blue Brain text mining toolbox for semantic search and structured information extraction

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Multi Task Vision and Language