Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Last update: Dec 28, 2022

Related tags

Overview

Knover

Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out efficient training/inference of large-scale dialogue generation models.

What's New:

December 2021: We are opening the dialogue generation model of PLATO-XL, with up to 11 billion parameters.
October 2021: We are opening AG-DST, an amendable generation for dialogue state tracking.
February 2021: We are opening our implementation (Team 19) in DSTC9-Track1.
July 2020: We are opening PLATO-2, a large-scale generative model with latent space for open-domain dialogue systems.

Requirements and Installation

python version >= 3.7
paddlepaddle-gpu version >= 2.0.0
- You can install PaddlePaddle following the instructions.
- The specific version of PaddlePaddle is also based on your CUDA version (recommended version: 10.1) and CuDNN version (recommended version: 7.6). See more information on PaddlePaddle document about GPU support
sentencepiece
termcolor
If you want to run distributed training, you'll also need NCCL
Install Knover locally:

git clone https://github.com/PaddlePaddle/Knover.git
cd Knover
pip3 install -e .

Or you can setup PYTHONPATH only:

export PYTHONPATH=/abs/path/to/Knover:$PYTHONPATH

Basic usage

See usage document.

Disclaimer

This project aims to facilitate further research progress in dialogue generation. Baidu is not responsible for the 3rd party's generation with the pre-trained system.

Contact information

For help or issues using Knover, please submit a GitHub issue.

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Related tags

Overview

Knover

What's New:

Requirements and Installation

Basic usage

Disclaimer

Contact information

Owner

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Persian Bert For Long-Range Sequences

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

Задания КЕГЭ по информатике 2021 на Python

Beyond the Imitation Game collaborative benchmark for enormous language models

Higher quality textures for the Metal Gear Solid series.

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.

Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

Global Rhythm Style Transfer Without Text Transcriptions

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

Pretrain CPM - 大规模预训练语言模型的预训练代码

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model