Findings of ACL 2021

Last update: Feb 24, 2022

Overview

Assessing Dialogue Systems with Distribution Distances

We propose to measure the performance of a dialogue system by computing the distributionwise distance between its generated conversations and real-world conversations.

To appear in Findings of ACL 2021.

Note that this is not an officially supported Tencent product.

1. Configuratin

This repository requires the packages:

pytorch
huggingface/transformers.

2. Usage

To evaluate the system-level human correlations of metrics:

python eval_metric.py \
  --data_path ./datasets/convai2_annotation.json \
  --metric fbd \
  --sample_num 10 \
  --model_type roberta-base \
  --batch_size 32

Currently, our repo supports the common metrics used in text generation field, inclduing bleu, meteor, rouge, greedy, average, extrema, bert_score, fbd and prd.

Here are some details of the six corpura compared in the main paper:

File Name	Dataset Name	Num. of Samples	Reference
`personam_annotation.json`	Persona(M)	60	Shikib/usr
`dailyh_annotation.json`	Daily(H)	150	li3cmz/GRADE
`convai2_annotation.json`	Convai2	150	li3cmz/GRADE
`empathetic_annotation.json`	Empathetic	150	li3cmz/GRADE
`dailyz_annotation.json`	Daily(Z)	100	ZHAOTING/dialog-processing
`personaz_annotation.json`	Persona(Z)	150	ZHAOTING/dialog-processing

Citation

If you use this research/codebase/dataset, please cite our paper:

@article{xiang2021assessing,
  title={Assessing Dialogue Systems with Distribution Distances},
  author={Xiang, Jiannan and Liu, Yahui and Cai, Deng and Li, Huayang and Lian, Defu and Liu, Lemao},
  journal={arXiv preprint arXiv:2105.02573},
  year={2021}
}

Other related papers:

[1] FID, GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, NIPS 2017
[2] PRD, Assessing Generative Models via Precision and Recall, NIPS 2018
[3] BERTScore, BERTScore: Evaluating Text Generation with BERT, ICLR 2020

Findings of ACL 2021

Related tags

Overview

Assessing Dialogue Systems with Distribution Distances

1. Configuratin

2. Usage

Citation

Owner

Yahui Liu

Let Xiao Ai speakers control third-party devices

Code for evaluating Japanese pretrained models provided by NTT Ltd.

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

Kurumi ChatBot

Chatbot with Pytorch, Python & Nextjs

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Autoregressive Entity Retrieval

A python package for deep multilingual punctuation prediction.

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

PyTorch implementation of Tacotron speech synthesis model.

Residual2Vec: Debiasing graph embedding using random graphs

TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech

SDL: Synthetic Document Layout dataset

2021 2학기 데이터크롤링 기말프로젝트

Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

Watson Natural Language Understanding and Knowledge Studio

A versatile token stream for handwritten parsers.

Deep Learning Topics with Computer Vision & NLP

NeoDays-based tileset for the roguelike CDDA (Cataclysm Dark Days Ahead)