Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

Related tags

Deep Learningabcd
Overview

Action-Based Conversations Dataset (ABCD)

This respository contains the code and data for ABCD (Chen et al., 2021)

Introduction

Whereas existing goal-oriented dialogue datasets focus mainly on identifying user intents, customer interactions in reality often involve agents following multi-step procedures derived from explicitly-defined guidelines. For example, in a online shopping scenario, a customer might request a refund for a past purchase. However, before honoring such a request, the agent should check the company policies to see if a refund is warranted. It is very likely that the agent will need to verify the customer's identity and check that the purchase was made within a reasonable timeframe.

To study dialogue systems in more realistic settings, we introduce the Action-Based Conversations Dataset (ABCD), where an agent's actions must be balanced between the desires expressed by the customer and the constraints set by company policies. The dataset contains over 10K human-to-human dialogues with 55 distinct user intents requiring unique sequences of actions to achieve task success. We also design a new technique called Expert Live Chat for collecting data when there are two unequal users engaging in real-time conversation. Please see the paper for more details.

Paper link: https://arxiv.org/abs/2104.00783

Blog link: https://www.asapp.com/blog/action-based-conversations-dataset/

Agent Dashboard

Customer Site

Usage

All code is run by executing the corresponding command within the shell script run.sh, which will kick off the data preparation and training within main.py. To use, first unzip the file found in data/abcd_v1.1.json.gz using the gunzip command (or similar). Then comment or uncomment the appropriate lines in the shell script to get desired behavior. Finally, enter sh run.sh into the command line to get started. Use the --help option of argparse for flag details or read through the file located within utils/arguments.py.

Preparation

Raw data will be loaded from the data folder and prepared into features that are placed into Datasets. If this has already occured, then the system will instead read in the prepared features from cache.

If running CDS for the first time, uncomment out the code within the run script to execute embed.py which will prepare the utterances for ranking.

Training

To specify the task for training, simply use the --task option with either ast or cds, for Action State Tracking and Cascading Dialogue Success respectively. Options for different model types are bert, albert and roberta. Loading scripts can be tuned to offer various other behaviors.

Evaluation

Activate evaluation using the --do-eval flag. By default, run.sh will perform cascading evaluation. To include ablations, add the appropriate options of --use-intent or --use-kb.

Data

The preprocessed data is found in abcd_v1.1.json which is a dictionary with keys of train, dev and test. Each split is a list of conversations, where each conversation is a dict containing:

  • convo_id: a unique conversation identifier
  • scenario: the ground truth scenario used to generate the prompt
  • original: the raw conversation of speaker and utterances as a list of tuples
  • delexed: the delexicalized conversation used for training and evaluation, see below for details

We provide the delexed version so new models performing the same tasks have comparable pre-processing. The original data is also provided in case you want to use the utterances for some other purpose.

For a quick preview, a small sample of chats is provided to help get started. Concretely, abcd_sample.json is a list containing three random conversations from the training set.

Scenario

Each scene dict contains details about the customer setup along with the underlying flow and subflow information an agent should use to address the customer concern. The components are:

  • Personal: personal data related to the (fictional) customer including account_id, customer name, membership level, phone number, etc.
  • Order: order info related to what the customer purchased or would like to purchase. Includes address, num_products, order_id, product names, and image info
  • Product: product details if applicable, includes brand name, product type and dollar amount
  • Flow and Subflow: these represent the ground truth user intent. They are used to generate the prompt, but are not shown directly the customer. The job of the agent is to infer this (latent) intent and then match against the Agent Guidelines to resolve the customer issue.

Guidelines

The agent guidelines are offered in their original form within Agent Guidelines for ABCD. This has been transformed into a formatted document for parsing by a model within data/guidelines.json. The intents with their button actions about found within kb.json. Lastly, the breakdown of all flows, subflows, and actions are found within ontology.json.

Conversation

Each conversation is made up of a list of turns. Each turn is a dict with five parts:

  • Speaker: either "agent", "customer" or "action"
  • Text: the utterance of the agent/customer or the system generated response of the action
  • Turn_Count: integer representing the turn number, starting from 1
  • Targets : list of five items representing the subtask labels
    • Intent Classification (text) - 55 subflow options
    • Nextstep Selection (text) - take_action, retrieve_utterance or end_conversation; 3 options
    • Action Prediction (text) - the button clicked by the agent; 30 options
    • Value Filling (list) - the slot value(s) associated with the action above; 125 options
    • Utterance Ranking (int) - target position within list of candidates; 100 options
  • Candidates: list of utterance ids representing the pool of 100 candidates to choose from when ranking. The surface form text can be found in utterances.json where the utt_id is the index. Only applicable when the current turn is a "retrieve_utterance" step.

In contrast to the original conversation, the delexicalized version will replace certain segments of text with special tokens. For example, an utterance might say "My Account ID is 9KFY4AOHGQ". This will be changed into "my account id is <account_id>".

Contact

Please email [email protected] for questions or feedback.

Citation

@inproceedings{chen2021abcd,
    title = "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems",
    author = "Chen, Derek and
        Chen, Howard and
        Yang, Yi and
        Lin, Alex and
        Yu, Zhou",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for 
    	Computational Linguistics: Human Language Technologies, {NAACL-HLT} 2021",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-main.239",
    pages = "3002--3017"
}
Owner
ASAPP Research
AI for Enterprise
ASAPP Research
This repo generates the training data and the model for Morpheus-Deblend

Morpheus-Deblend This repo generates the training data and the model for Morpheus-Deblend. This is the active development repo for the project and as

Ryan Hausen 2 Apr 18, 2022
torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

torchsummaryDynamic Improved tool of torchsummaryX. torchsummaryDynamic support real FLOPs calculation of dynamic network or user-custom PyTorch ops.

Bohong Chen 1 Jan 07, 2022
Codes for the AAAI'22 paper "TransZero: Attribute-guided Transformer for Zero-Shot Learning"

TransZero [arXiv] This repository contains the testing code for the paper "TransZero: Attribute-guided Transformer for Zero-Shot Learning" accepted to

Shiming Chen 52 Jan 01, 2023
CNNs for Sentence Classification in PyTorch

Introduction This is the implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in PyTorch. Kim's implementation of t

Shawn Ng 956 Dec 19, 2022
A chemical analysis of lipophilicities & molecule drawings including ML

A chemical analysis of lipophilicity & molecule drawings including a bit of ML analysis. This is a simple project that includes two Jupyter files (one

Aurimas A. NausÄ—das 7 Nov 22, 2022
PyTorch implementation of "Simple and Deep Graph Convolutional Networks"

Simple and Deep Graph Convolutional Networks This repository contains a PyTorch implementation of "Simple and Deep Graph Convolutional Networks".(http

chenm 253 Dec 08, 2022
MWPToolkit is a PyTorch-based toolkit for Math Word Problem (MWP) solving.

MWPToolkit is a PyTorch-based toolkit for Math Word Problem (MWP) solving. It is a comprehensive framework for research purpose that integrates popular MWP benchmark datasets and typical deep learnin

119 Jan 04, 2023
Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository Repository containing all scripts used in the studies of Online-compatible Un

0 Nov 09, 2021
MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity Introduction The 3D LiDAR place recognition aim

16 Dec 08, 2022
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

SimMIM By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai and Han Hu*. This repo is the official implementation of

Microsoft 674 Dec 26, 2022
Codes for CyGen, the novel generative modeling framework proposed in "On the Generative Utility of Cyclic Conditionals" (NeurIPS-21)

On the Generative Utility of Cyclic Conditionals This repository is the official implementation of "On the Generative Utility of Cyclic Conditionals"

Chang Liu 44 Nov 16, 2022
Hepsiburada - Hepsiburada Urun Bilgisi Cekme

Hepsiburada Urun Bilgisi Cekme from hepsiburada import Marka nike = Marka("nike"

Ilker Manap 8 Oct 26, 2022
Multi Camera Calibration

Multi Camera Calibration 'modules/camera_calibration/app/camera_calibration.cpp' is for calculating extrinsic parameter of each individual cameras. 'm

7 Dec 01, 2022
Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

Differentiable Neural Computers and family, for Pytorch Includes: Differentiable Neural Computers (DNC) Sparse Access Memory (SAM) Sparse Differentiab

ixaxaar 302 Dec 14, 2022
Explainability of the Implications of Supervised and Unsupervised Face Image Quality Estimations Through Activation Map Variation Analyses in Face Recognition Models

Explainable_FIQA_WITH_AMVA Note This is the official repository of the paper: Explainability of the Implications of Supervised and Unsupervised Face I

3 May 08, 2022
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image [Project Page] [Paper] [Supp. Mat.] Table of Contents License Description Fittin

Vassilis Choutas 1.3k Jan 07, 2023
Tackling Obstacle Tower Challenge using PPO & A2C combined with ICM.

Obstacle Tower Challenge using Deep Reinforcement Learning Unity Obstacle Tower is a challenging realistic 3D, third person perspective and procedural

Zhuoyu Feng 5 Feb 10, 2022
RSNA Intracranial Hemorrhage Detection with python

RSNA Intracranial Hemorrhage Detection This is the source code for the first place solution to the RSNA2019 Intracranial Hemorrhage Detection Challeng

24 Nov 30, 2022
[NeurIPS'21] Projected GANs Converge Faster

[Project] [PDF] [Supplementary] [Talk] This repository contains the code for our NeurIPS 2021 paper "Projected GANs Converge Faster" by Axel Sauer, Ka

798 Jan 04, 2023
3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Akash James 39 Nov 21, 2022