The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Our released data can be found at this link. Make sure the following steps are adopted to use our codes.

How to Use

  1. Init the repo

    Before using the repo, please run the following command to init:

    # create the necessay folders
    python init.py
    
    # prepare the environment
    # if some package cannot be installed, just google and install it from other ways
    pip install -r requirements.txt
  2. train the model

    ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
  3. test the model [rerank]

    ./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>
  4. test the model [recal]

    # different recall_modes are available: q-q, q-r
    ./scripts/test_recall.sh <dataset_name> <model_name> <cuda_id>
  5. inference the responses and save into the faiss index

    Somethings inference will missing data samples, please use the 1 gpu (faiss-gpu search use 1 gpu quickly)

    It should be noted that: 1. For writer dataset, use extract_inference.py script to generate the inference.txt 2. For other datasets(douban, ecommerce, ubuntu), just cp train.txt inference.txt. The dataloader will automatically read the test.txt to supply the corpus.

    # work_mode=response, inference the response and save into faiss (for q-r matching) [dual-bert/dual-bert-fusion]
    # work_mode=context, inference the context to do q-q matching
    # work_mode=gray, inference the context; read the faiss(work_mode=response has already been done), search the topk hard negative samples; remember to set the BERTDualInferenceContextDataloader in config/base.yaml
    ./scripts/inference.sh <dataset_name> <model_name> <cuda_ids>

    If you want to generate the gray dataset for the dataset:

    # 1. set the mode as the **response**, to generate the response faiss index; corresponding dataset name: BERTDualInferenceDataset;
    ./scripts/inference.sh <dataset_name> response <cuda_ids>
    
    # 2. set the mode as the **gray**, to inference the context in the train.txt and search the top-k candidates as the gray(hard negative) samples; corresponding dataset name: BERTDualInferenceContextDataset
    ./scripts/inference.sh <dataset_name> gray <cuda_ids>
    
    # 3. set the mode as the **gray-one2many** if you want to generate the extra positive samples for each context in the train set, the needings of this mode is the same as the **gray** work mode
    ./scripts/inference.sh <dataset_name> gray-one2many <cuda_ids>

    If you want to generate the pesudo positive pairs, run the following commands:

    # make sure the dual-bert inference dataset name is BERTDualInferenceDataset
    ./scripts/inference.sh <dataset_name> unparallel <cuda_ids>
  6. deploy the rerank and recall model

    # load the model on the cuda:0(can be changed in deploy.sh script)
    ./scripts/deploy.sh <cuda_id>

    at the same time, you can test the deployed model by using:

    # test_mode: recall, rerank, pipeline
    ./scripts/test_api.sh <test_mode> <dataset>
  7. test the recall performance of the elasticsearch

    Before testing the es recall, make sure the es index has been built:

    # recall_mode: q-q/q-r
    ./scripts/build_es_index.sh <dataset_name> <recall_mode>
    # recall_mode: q-q/q-r
    ./scripts/test_es_recall.sh <dataset_name> <recall_mode> 0
  8. simcse generate the gray responses

    # train the simcse model
    ./script/train.sh <dataset_name> simcse <cuda_ids>
    # generate the faiss index, dataset name: BERTSimCSEInferenceDataset
    ./script/inference_response.sh <dataset_name> simcse <cuda_ids>
    # generate the context index
    ./script/inference_simcse_response.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_simcse_unlikelyhood_response.sh <dataset_name> simcse <cuda_ids>
    # generate the gray response
    ./script/inference_gray_simcse.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_gray_simcse_unlikelyhood.sh <dataset_name> simcse <cuda_ids>
Owner
GMFTBY
Those who are crazy enough to think they can change the world are the ones who can.
GMFTBY
RaidBot for WhatsApp

WhatsappRaid Скрипт подготовлен специально для сайта https://pysoc.ru и Ютуб канала PyPro Русский Простой спам бот для WhatsApp на Python3. Работает с

2 May 12, 2022
Sample code helps get you started with a simple Python web service using AWS Lambda and Amazon API Gateway

Welcome to the AWS CodeStar sample web service This sample code helps get you started with a simple Python web service using AWS Lambda and Amazon API

0 Jan 20, 2022
A nuker for Roblox accounts.

Roblox-Nuker A nuker for Roblox accounts. Made by Ice Bear#0167 Usage I would recommend running in replit (https://replit.com) as it is deprecated in

7 May 10, 2022
📷 Instagram Bot - Tool for automated Instagram interactions

InstaPy Tooling that automates your social media interactions to “farm” Likes, Comments, and Followers on Instagram Implemented in Python using the Se

Tim Großmann 13.5k Dec 01, 2021
A discord tool to use bugs and exploits

DiscordTool A discord tool to use bugs and exploits Features: send a buggy messa

6 Aug 19, 2022
A Telegram Bot Written In Python

TelegraphUploader A Telegram Bot Written In Python DEPLOY Local Machine Clone the repository Install requirements: pip3 install -r requirements.txt e

Wahyusaputra 2 Dec 29, 2021
Using AWS Batch jobs to bulk copy/sync files in S3

Using AWS Batch jobs to bulk copy/sync files in S3

AWS Samples 14 Sep 19, 2022
Discord Rich Presence for Team Fortress 2

TF2 Rich Presence Discord Rich Presence for Team Fortress 2 Detects current game state, queue info, playtime, and more Configurable, reliable, and per

Kataiser 33 Dec 31, 2022
D(HE)ater is a security tool can perform DoS attack by enforcing the DHE key exchange.

D(HE)ater D(HE)ater is an attacking tool based on CPU heating in that it forces the ephemeral variant of Diffie-Hellman key exchange (DHE) in given cr

Balasys 138 Dec 15, 2022
A simple economy bot for discord!

Enter all the correct values in the given configuration.json file. Make sure that BOT_TOKEN is a string value, and that OWNER_ID and GUILD_ID are integer values.

WonkyPigs 0 Aug 22, 2022
🦊 Powerfull Discord Nitro Generator

🦊 Follow me here 🦊 Discord | YouTube | Github ☕ Usage 💻 Downloading git clone https://github.com/KanekiWeb/Nitro-Generator/new/main pip insta

Kaneki 104 Jan 02, 2023
A file-based quote bot written in Python

Let's Write a Python Quote Bot! This repository will get you started with building a quote bot in Python. It's meant to be used along with the Learnin

Florent 1 Dec 17, 2021
Robust and blazing fast open-redirect vulnerability scanner with ability of recursevely crawling all of web-forms, entry points, or links with data.

After Golismero project got dead there is no more any up to date open-source tool that can collect links with parametrs and web-forms and then test th

railway zeppelin 34 Aug 25, 2022
Python Client for Yandex Cloud Logging

Python Client for Yandex Cloud Logging Installation pip3 install python-yandex-cloud-logging Creating a Yandex Cloud Logging Group yc logging group c

MCode 0 Dec 08, 2021
A Telegram Bot to return Youtube Video Tags Using YoutubeTags API

YouTube-TagFind-Bot A Telegram Bot to return Youtube Video Tags Using YoutubeTags API YoutubeTags API Wrapper YoutubeTags is a python third-party api

Nuhman Pk 9 Aug 25, 2022
Materials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"

Amazon Brands and Exclusives This repository contains code to reproduce the findings featured in our story "Amazon Puts Its Own 'Brands' First Above B

The Markup 60 Nov 11, 2022
Osmnx-examples - Usage examples, demos, and tutorials for OSMnx.

OSMnx Examples OSMnx is a Python package to work with street networks and other spatial data from OpenStreetMap: retrieve, model, analyze, and visuali

Geoff Boeing 1.2k Jan 03, 2023
Ap lokit lokit

🎵 FANDA MUSIC BOT Fanda Music adalah proyek bot telegram yang memungkinkan Anda memutar musik di obrolan suara grup telegram. a href="https://www.py

Fatur 2 Nov 18, 2021
Unofficial calendar integration with Gradescope

Gradescope-Calendar This script scrapes your Gradescope account for courses and assignment details. Assignment details currently can be transferred to

6 May 06, 2022