An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

Last update: Oct 28, 2022

Related tags

Overview

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation and video description. It takes multi-modal sources, text references as well as text predictions as inputs, and analyzes them visually in Jupyter Notebook or a built-in Web App (the former has Fairseq integration). VizSeq also provides a collection of multi-process scorers as a normal Python package.

[Paper] [Documentation] [Blog]

Task Coverage

Source	Example Tasks
Text	Machine translation, text summarization, dialog generation, grammatical error correction, open-domain question answering
Image	Image captioning, image question answering, optical character recognition
Audio	Speech recognition, speech translation
Video	Video description
Multimodal	Multimodal machine translation

Metric Coverage

Accelerated with multi-processing/multi-threading.

Type	Metrics
N-gram-based	BLEU (Papineni et al., 2002), NIST (Doddington, 2002), METEOR (Banerjee et al., 2005), TER (Snover et al., 2006), RIBES (Isozaki et al., 2010), chrF (Popović et al., 2015), GLEU (Wu et al., 2016), ROUGE (Lin, 2004), CIDEr (Vedantam et al., 2015), WER
Embedding-based	LASER (Artetxe and Schwenk, 2018), BERTScore (Zhang et al., 2019)

Getting Started

Installation

VizSeq requires Python 3.6+ and currently runs on Unix/Linux and macOS/OS X. It will support Windows as well in the future.

You can install VizSeq from PyPI repository:

$ pip install vizseq

Or install it from source:

$ git clone https://github.com/facebookresearch/vizseq
$ cd vizseq
$ pip install -e .

Documentation

Jupyter Notebook Examples

Fairseq integration

Web App Example

Download example data:

$ git clone https://github.com/facebookresearch/vizseq
$ cd vizseq
$ bash get_example_data.sh

Launch the web server:

$ python -m vizseq.server --port 9001 --data-root ./examples/data

And then, navigate to the following URL in your web browser:

http://localhost:9001

License

VizSeq is licensed under MIT. See the LICENSE file for details.

Citation

Please cite as

@inproceedings{wang2019vizseq,
  title = {VizSeq: A Visual Analysis Toolkit for Text Generation Tasks},
  author = {Changhan Wang, Anirudh Jain, Danlu Chen, Jiatao Gu},
  booktitle = {In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year = {2019},
}

Contact

Changhan Wang ([email protected]), Jiatao Gu ([email protected])

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

Related tags

Overview

Task Coverage

Metric Coverage

Getting Started

Installation

Documentation

Jupyter Notebook Examples

Fairseq integration

Web App Example

License

Citation

Contact

Owner

Facebook Research

ConvBERT-Prod

Share constant definitions between programming languages and make your constants constant again

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

The entmax mapping and its loss, a family of sparse softmax alternatives.

Rootski - Full codebase for rootski.io (without the data)

Club chatbot

NLP-Project - Used an API to scrape 2000 reddit posts, then used NLP analysis and created a classification model to mixed succcess

This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.

This is the source code of RPG (Reward-Randomized Policy Gradient)

GPT-3 command line interaction

Use the power of GPT3 to execute any function inside your programs just by giving some doctests

An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.

Translation to python of Chris Sims' optimization function

Refactored version of FastSpeech2

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

NSFW A chatbot based on GPT2-chitchat

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

Related tags

Overview

Task Coverage

Metric Coverage

Getting Started

Installation

Documentation

Jupyter Notebook Examples

Fairseq integration

Web App Example

License

Citation

Contact

Owner

Facebook Research

ConvBERT-Prod

Share constant definitions between programming languages and make your constants constant again

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

The entmax mapping and its loss, a family of sparse softmax alternatives.

Rootski - Full codebase for rootski.io (without the data)

Club chatbot

NLP-Project - Used an API to scrape 2000 reddit posts, then used NLP analysis and created a classification model to mixed succcess

This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.

This is the source code of RPG (Reward-Randomized Policy Gradient)

GPT-3 command line interaction

Use the power of GPT3 to execute any function inside your programs just by giving some doctests

An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.

Translation to python of Chris Sims' optimization function

Refactored version of FastSpeech2

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

**NSFW** A chatbot based on GPT2-chitchat

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

NSFW A chatbot based on GPT2-chitchat