The Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

Related tags

Deep LearningFSB
Overview

Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

This repository includes the dataset, experiments results, and code for the paper:

Few-Shot Bot: Prompt-Based Learning for Dialogue Systems PDF.

Authors: Andrea Madotto, Zhaojiang Lin, Genta Indra Winata, Pascale Fung

Abstract

Learning to converse using only a few examples is a grand challenge in Conversational AI. The current best conversational models, which are either good chit-chatters (e.g., BlenderBot) or goal-oriented systems (e.g., MinTL), are language models (LMs) fine-tuned on large conversational datasets. Training these models is expensive, both in terms of computational resources and time, and it is hard to keep these models up to date with new conversational skills. A simple yet unexplored solution is prompt-based few-shot learning (Brown et al. 2020) which does not require gradient-based fine-tuning but instead uses a few examples in the LM context as the only source of learning. In this paper, we explore prompt-based few-shot learning in dialogue tasks. We benchmark LMs of different sizes in 9 response generation tasks, which include a variety of knowledge-grounded tasks, task-oriented generations, general open-chat, and controlled stylistic generation, and 5 conversational parsing tasks, which include dialogue state tracking, graph path generation, persona information extraction, and document retrieval. The current largest, released, LM (GPT-J-6B) achieves competitive performance to full-training state-of-the-art models by using the prompt-based few-shot learning, thus no training. Moreover, we proposed a novel perplexity-based classifier, that also does not require any fine-tuning, to select the most appropriate prompt given a dialogue history, as to create an all-in-one model with multiple dialogue skills. Finally, by combining the power of prompt-based few-shot learning and the skill selector, we create an end-to-end chatbot named the Few-Shot Bot, which automatically selects the most appropriate conversational skill, queries different KBs or the internet, and uses it to generate a human-like response, all by using only one dialogue example per skill.

Installation

In this repo, we load all the validation and test sets used in the evaluation. For running the experiments and the demo, you should install the following requirements:

pip install -r requirements.txt

Basic Running

Reproducing the results and plots

The generation folder stores the generated responses of the experiments in all datasets. To generate the tables and the plots in the paper, run

python generate_plots_tables.py

This script loads all the files and computes the mean between different runs and it generates the plots. Note that this script is very custum for each datasets, but it can serve as guide line for future extentions.

Running the experiments

There are three main files to run 1) response generation (main_response_generation.py), 2) conversational parsing (main_conversational_parsing.py), and 3) skill-selector (main_skill_selector.py). In these files, we load the necessary prompt (load_prefix) and we run the generation (generate_response) for each sample in the test set. Since each dialogue skill require a different template, as shown in the paper, we create a function that converts structured data into the correct shot prompt. An example of this function can be found in prompts/persona_chat.py, and in generic_prompts.py we store the generation functions.

In each main file there is configuration object (mapper) which specify meta-information about the task (i.e., number of shots, generation length, decoding type, prompt converter). Expecially for conversational parsing, there are different decoding type. For example, in MWOZ the model generates the dialogue state, which is further looped into the next turn.

How to run?

For example, to run the persona chat experiments (0, 1, k-shots), you can use the following command:

python main_response_generation.py --model_checkpoint EleutherAI/gpt-j-6B --dataset persona --gpu 0

In case your GPU has less that 16GB, then you could add --multigpu to spawn 4 GPUs (e.g., 1080Ti) and do inference in parallel. Similarly, for conversational parsing tasks, you could use:

python main_conversational_parsing.py --model_checkpoint EleutherAI/gpt-j-6B --dataset wow-parse --gpu 0

Notice that some parsing task requires a knowledge base (e.g., dialKG-parse requires the KG in neo4j). Finally, to run the skill-selector task, you could use:

python main_skill_selector.py --model_checkpoint EleutherAI/gpt-j-6B --shots_k 6 --repetition 1 --gpu 0

where repetition is the seed for selecting random samples in the prompts.

Runners

In the runners folder, we provide a rudimental runner to run all the experiments and reproduce the results in the paper.

Few-Shot Bot

There are two modes for the FSB such as 1) controlled style generation and 2) full-model. Currently we support the controlled style generation model. Check the FSB-CG.ipynb to try to interact with FSB in your local machine, or try directly in colab at https://colab.research.google.com/drive/15hQv1V3Cs5kQVfLOE_FZc1VCWQ3YpWVd?usp=sharing (Remeber to select the enviroment with GPU).

Owner
Andrea Madotto
Deep learning, Machine Learning, Learning To Learn, Natural Language Processing.
Andrea Madotto
(AAAI2020)Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing This repository contains pytorch source code for AAAI2020 oral paper: Grapy-ML

54 Aug 04, 2022
Rl-quickstart - Reinforcement Learning Quickstart

Reinforcement Learning Quickstart To get setup with the repository, git clone ht

UCLA DataRes 3 Jun 16, 2022
Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Populating 3D Scenes by Learning Human-Scene Interaction [Project Page] [Paper] License Software Copyright License for non-commercial scientific resea

Mohamed Hassan 81 Nov 08, 2022
Source code for Acorn, the precision farming rover by Twisted Fields

Acorn precision farming rover This is the software repository for Acorn, the precision farming rover by Twisted Fields. For more information see twist

Twisted Fields 198 Jan 02, 2023
Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.

The Ultimate PyTorch Source-Build Template Translations: 한국어 TL;DR PyTorch built from source can be x4 faster than a naïve PyTorch install. This repos

Joonhyung Lee/이준형 651 Dec 12, 2022
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

EfficientZero (NeurIPS 2021) Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Thank you for you

Weirui Ye 671 Jan 03, 2023
A self-supervised learning framework for audio-visual speech

AV-HuBERT (Audio-Visual Hidden Unit BERT) Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Robust Self-Supervised A

Meta Research 431 Jan 07, 2023
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 01, 2023
[PAMI 2020] Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation

Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation This repository contains the source code for

Yun-Chun Chen 60 Nov 25, 2022
Tools for computational pathology

A toolkit for computational pathology and machine learning. View documentation Please cite our paper Installation There are several ways to install Pa

254 Dec 12, 2022
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning (NeurIPS 2020) Introduction AdaShare is a novel and differentiable approach fo

94 Dec 22, 2022
code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning Overview This code is for paper: Not All Unlabeled Data are Equa

Jason Ren 22 Nov 23, 2022
Pytorch implementation of

EfficientTTS Unofficial Pytorch implementation of "EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture"(arXiv). Disclaimer: Somebo

Liu Songxiang 109 Nov 16, 2022
Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)

Face-Detection-with-MTCNN Face detection is a computer vision problem that involves finding faces in photos. It is a trivial problem for humans to sol

Chetan Hirapara 3 Oct 07, 2022
FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

FedJAX: Federated learning with JAX What is FedJAX? FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX. FedJAX priori

Google 208 Dec 14, 2022
Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Ibai Gorordo 35 Sep 07, 2022
YOLOX + ROS(1, 2) object detection package

YOLOX + ROS(1, 2) object detection package

Ar-Ray 158 Dec 21, 2022
Codebase of deep learning models for inferring stability of mRNA molecules

Kaggle OpenVaccine Models Codebase of deep learning models for inferring stability of mRNA molecules, corresponding to the Kaggle Open Vaccine Challen

Eternagame 40 Dec 29, 2022
ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

ManimML ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

259 Jan 04, 2023
Point detection through multi-instance deep heatmap regression for sutures in endoscopy

Suture detection PyTorch This repo contains the reference implementation of suture detection model in PyTorch for the paper Point detection through mu

artificial intelligence in the area of cardiovascular healthcare 3 Jul 16, 2022