The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Our released data can be found at this link. Make sure the following steps are adopted to use our codes.

How to Use

  1. Init the repo

    Before using the repo, please run the following command to init:

    # create the necessay folders
    python init.py
    
    # prepare the environment
    # if some package cannot be installed, just google and install it from other ways
    pip install -r requirements.txt
  2. train the model

    ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
  3. test the model [rerank]

    ./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>
  4. test the model [recal]

    # different recall_modes are available: q-q, q-r
    ./scripts/test_recall.sh <dataset_name> <model_name> <cuda_id>
  5. inference the responses and save into the faiss index

    Somethings inference will missing data samples, please use the 1 gpu (faiss-gpu search use 1 gpu quickly)

    It should be noted that: 1. For writer dataset, use extract_inference.py script to generate the inference.txt 2. For other datasets(douban, ecommerce, ubuntu), just cp train.txt inference.txt. The dataloader will automatically read the test.txt to supply the corpus.

    # work_mode=response, inference the response and save into faiss (for q-r matching) [dual-bert/dual-bert-fusion]
    # work_mode=context, inference the context to do q-q matching
    # work_mode=gray, inference the context; read the faiss(work_mode=response has already been done), search the topk hard negative samples; remember to set the BERTDualInferenceContextDataloader in config/base.yaml
    ./scripts/inference.sh <dataset_name> <model_name> <cuda_ids>

    If you want to generate the gray dataset for the dataset:

    # 1. set the mode as the **response**, to generate the response faiss index; corresponding dataset name: BERTDualInferenceDataset;
    ./scripts/inference.sh <dataset_name> response <cuda_ids>
    
    # 2. set the mode as the **gray**, to inference the context in the train.txt and search the top-k candidates as the gray(hard negative) samples; corresponding dataset name: BERTDualInferenceContextDataset
    ./scripts/inference.sh <dataset_name> gray <cuda_ids>
    
    # 3. set the mode as the **gray-one2many** if you want to generate the extra positive samples for each context in the train set, the needings of this mode is the same as the **gray** work mode
    ./scripts/inference.sh <dataset_name> gray-one2many <cuda_ids>

    If you want to generate the pesudo positive pairs, run the following commands:

    # make sure the dual-bert inference dataset name is BERTDualInferenceDataset
    ./scripts/inference.sh <dataset_name> unparallel <cuda_ids>
  6. deploy the rerank and recall model

    # load the model on the cuda:0(can be changed in deploy.sh script)
    ./scripts/deploy.sh <cuda_id>

    at the same time, you can test the deployed model by using:

    # test_mode: recall, rerank, pipeline
    ./scripts/test_api.sh <test_mode> <dataset>
  7. test the recall performance of the elasticsearch

    Before testing the es recall, make sure the es index has been built:

    # recall_mode: q-q/q-r
    ./scripts/build_es_index.sh <dataset_name> <recall_mode>
    # recall_mode: q-q/q-r
    ./scripts/test_es_recall.sh <dataset_name> <recall_mode> 0
  8. simcse generate the gray responses

    # train the simcse model
    ./script/train.sh <dataset_name> simcse <cuda_ids>
    # generate the faiss index, dataset name: BERTSimCSEInferenceDataset
    ./script/inference_response.sh <dataset_name> simcse <cuda_ids>
    # generate the context index
    ./script/inference_simcse_response.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_simcse_unlikelyhood_response.sh <dataset_name> simcse <cuda_ids>
    # generate the gray response
    ./script/inference_gray_simcse.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_gray_simcse_unlikelyhood.sh <dataset_name> simcse <cuda_ids>
Owner
GMFTBY
Those who are crazy enough to think they can change the world are the ones who can.
GMFTBY
You can share your Chegg account for answers using this bot with your friends without getting your account blocked/flagged

Chegg-Answer-Bot You can share your Chegg account for answers using this bot with your friends without getting your account blocked/flagged Reuirement

Ammey Saini 27 Dec 24, 2022
Rbx-mass-send - mass sends trades to item owners

mass sends trades to item owners proxies should be in ip:port format itemsToSend

0 Feb 20, 2022
Red-mail - Advanced email sending library for Python

Red Mail Next generation email sender What is it? Red Mail is an advanced email

Mikael Koli 313 Jan 08, 2023
A Discord Bot coded using Python. Open to collaboration

DisPy-Bot A Discord Bot coded using Python. Open to collaboration La syntax pour intégrer le bot (imaginons la fonction lol_reponse dans le fichier au

BiMathAx 2 Mar 03, 2022
Visualize size of directories, s3 buckets.

Dir Sizer This is a work in progress, right now consider this an Alpha or Proof of Concept level. dir_sizer is a utility to visualize the size of a di

Scott Seligman 13 Dec 08, 2022
Minecraft checker

This Project checks if a minecraft account is a nfa/sfa account or invalid it also says you if the ip you are using is shadow banned from minecraft (shadow bann is if you send too many login attempts

baum1810 4 Oct 03, 2022
A simple Facebook Account generator, written in python (needs different Email so Accounts do not get banned)

FacebookAccountGenerator FAB is a Facebook-Account generating script, written in python Installation Use the package manager pip to install selenium p

MrOverload 7 Jan 05, 2023
Transcript-Extractor-Bot - Yet another Telegram Voice Recognition bot but using vosk and supports 20+ languages

transcript extractor Yet another Telegram Voice Recognition bot but using vosk a

6 Oct 21, 2022
Project made to analyse movie trends

MovieTrends Project to analyse the daily movie trends from the website The Movie DataBase. The main idea is upload the results to a PostgreSQL server

Jazmín López Chacón 0 Feb 15, 2022
TikTok 4L and 4C checker that doesn't count banned usernames as available

TikTok 4L and 4C checker that doesn't count banned usernames as available. Once a username is available, it will send it to your Discord Webhook.

cliphd 26 May 01, 2022
A simple and easy to use musicbot in python and it uses lavalink.

Lavalink-MusicBot A simple and easy to use musicbot in python and it uses lavalink. ✨ Features plays music in your discord server well thats it i gues

Afnan 1 Nov 29, 2021
A method to check whether a Discord user is using the client or not.

Discord Captcha Method This is an example, of a verification trough a check, if the user loads the picture send with the verification-message. This ma

Julien 2 Jan 19, 2022
A discord bot thet lets you play Space invaders.

space_Invaders A discord bot thet lets you play Space invaders. It is my first discord bot... so please give any suggestions to improve it :] Commands

2 Dec 30, 2021
Script to automatically book a vaccine slot on Doctolib for today or tomorrow, following rules from the French Government.

DOCTOSHOTGUN This script lets you automatically book a vaccine slot on Doctolib for today or tomorrow, following rules from the French Government. Pyt

Romain Bignon 560 Dec 19, 2022
🚀🔥使用Python连接阿里云盘, 实现了官方大部分功能 👍👍

aligo 🚀 🔥 使用Python连接阿里云盘, 实现了官方大部分功能 👍 👍 为了完善代码提示, 方便大家代码书写, aligo 引入了一些 python 3.8 的新特性, 所以要求 python = 3.8.* pip install aligo 或 pip install ali

455 Jan 08, 2023
Spacecrypto-bot - SpaceCrypto Bot Auto Clicker

SpaceCrypto Auto Clicker Bot Também fiz um para Luna Rush ( https://github.com/w

Walter Discher Cechinel 5 Feb 22, 2022
Python library wrapping and enhancing the Invenio RDM REST API.

Iridium The metal Iridium is used to refine and enhance metal alloys. Similarly, this package provides an enhanced coating around the Invenio RDM APIs

Materials Data Science and Informatics 2 Mar 29, 2022
JAWS Pankration 2021 - DDD on AWS Lambda sample

JAWS Pankration 2021 - DDD on AWS Lambda sample What is this project? This project contains sample code for AWS Lambda with domain models. I presented

Atsushi Fukui 21 Mar 30, 2022
Project glow is an open source bot worked on by many people to create a good and safe moderation bot for all

Project Glow Greetings, I see you have stumbled upon project glow. Project glow is an open source bot worked on by many people to create a good and sa

Glowstikk 24 Sep 29, 2022
A Telegram Message Manager Bot by @AbirHasan2005

Messages-Manager-Bot A Telegram Message Manager Bot by @AbirHasan2005. This Bot can delete specific type of messages from Group. I specially use for @

Abir Hasan 32 Nov 12, 2022