中文問句產生器;使用台達電閱讀理解資料集(DRCD)

Overview

Transformer QG on DRCD

The inputs of the model refers to

we integrate C and A into a new C' in the following form.
C' = [c1, c2, ..., [HL], a1, ..., a|A|, [HL], ..., c|C|]

Proposed by Ying-Hong Chan & Yao-Chung Fan. (2019). A Re-current BERT-based Model for Question Generation.

我們還有另外一個英文QG: Transformer-QG-on-SQuAD

Features

  • 完整的流程;從微調到模型評分
  • 支援許多先進的語言模型
  • 內建Flask,可快速作為API server

DRCD dataset

台達閱讀理解資料集 Delta Reading Comprehension Dataset (DRCD) 屬於通用領域繁體中文機器閱讀理解資料集。 DRCD資料集從2,108篇維基條目中整理出10,014篇段落,並從段落中標註出30,000多個問題。

Available models

Use in Transformers

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
  
tokenizer = AutoTokenizer.from_pretrained("p208p2002/bart-drcd-qg-hl")

model = AutoModelForSeq2SeqLM.from_pretrained("p208p2002/bart-drcd-qg-hl")

Expriments

Model Bleu 1 Bleu 2 Bleu 3 Bleu 4 METEOR ROUGE-L
BART-HLSQG 34.25 27.70 22.43 18.13 23.58 36.88

Environment requirements

The hole development is based on Ubuntu system

  1. If you don't have pytorch 1.6+ please install or update first

https://pytorch.org/get-started/locally/

  1. Install packages pip install -r requirements.txt

  2. Setup scorer python setup_scorer.py

  3. Download dataset python init_dataset.py

Training

Seq2Seq LM

usage: train_seq2seq_lm.py [-h]
                           [--base_model {bert-base-chinese,uer/bart-base-chinese-cluecorpussmall,p208p2002/bart-drcd-qg-hl}]
                           [-d {drcd}] [--batch_size BATCH_SIZE]
                           [--epoch EPOCH] [--lr LR] [--dev DEV] [--server]
                           [--run_test] [-fc FROM_CHECKPOINT]

optional arguments:
  -h, --help            show this help message and exit
  --base_model {bert-base-chinese,uer/bart-base-chinese-cluecorpussmall,p208p2002/bart-drcd-qg-hl}
  -d {drcd}, --dataset {drcd}
  --batch_size BATCH_SIZE
  --epoch EPOCH
  --lr LR
  --dev DEV
  --server
  --run_test
  -fc FROM_CHECKPOINT, --from_checkpoint FROM_CHECKPOINT

Run as API server

From pre-trained (recommend)

python train_seq2seq_lm.py --server --base_model p208p2002/bart-drcd-qg-hl

From your own checkpoint

python train_xxx_lm.py --server --base_model YOUR_BASE_MODEL --from_checkpoint FROM_CHECKPOINT

Request example

curl --location --request POST 'http://127.0.0.1:5000/' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'context=[HL]伊隆·里夫·馬斯克[HL]是一名企業家和商業大亨'
{"predict": "哪一個人是一名企業家和商業大亨?"}
Owner
Philip
NLP Engineer and Full Stack Developer
Philip
숭실대학교 컴퓨터학부 전공종합설계프로젝트

✨ 시각장애인을 위한 버스도착 알림 장치 ✨ 👀 개요 현대 사회에서 대중교통 위치 정보를 이용하여 사람들이 간단하게 이용할 대중교통의 정보를 얻고 쉽게 대중교통을 이용할 수 있다. 해당 정보는 각종 어플리케이션과 대중교통 이용시설에서 위치 정보를 제공하고 있지만 시각

taegyun 3 Jan 25, 2022
NLP tool to extract emotional phrase from tweets 🤩

Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the

Shahul ES 38 Oct 17, 2022
Simple Text-To-Speech Bot For Discord

Simple Text-To-Speech Bot For Discord This is a very simple TTS bot for discord made with python. For this bot you need FFMPEG, see installation to se

1 Sep 26, 2022
Code for the paper PermuteFormer

PermuteFormer This repo includes codes for the paper PermuteFormer: Efficient Relative Position Encoding for Long Sequences. Directory long_range_aren

Peng Chen 42 Mar 16, 2022
Script to download some free japanese lessons in portuguse from NHK

Nihongo_nhk This is a script to download some free japanese lessons in portuguese from NHK. It can be executed by installing the packages with: pip in

Matheus Alves 2 Jan 06, 2022
NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

NumPy String-Indexed NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels, rather than conventio

Aitan Grossman 1 Jan 08, 2022
A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code or write code yourself

Scriptfab - What is it? A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code

DevNugget 3 Jul 28, 2021
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 364 Jan 06, 2023
自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器

ja-timex 自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器 概要 ja-timex は、現代日本語で書かれた自然文に含まれる時間情報表現を抽出しTIMEX3と呼ばれるアノテーション仕様に変換することで、プログラムが利用できるような形に規格化するルールベースの解析器です。

Yuki Okuda 116 Nov 09, 2022
IEEEXtreme15.0 Questions And Answers

IEEEXtreme15.0 Questions And Answers IEEEXtreme is a global challenge in which teams of IEEE Student members – advised and proctored by an IEEE member

Dilan Perera 15 Oct 24, 2022
BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network) BERTAC is a framework that combines a

6 Jan 24, 2022
PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

Facebook Research 605 Jan 02, 2023
ChessCoach is a neural network-based chess engine capable of natural-language commentary.

ChessCoach is a neural network-based chess engine capable of natural-language commentary.

Chris Butner 380 Dec 03, 2022
Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Realistic Few-Shot Relation Extraction This repository contains code to reproduce the results in the paper "Towards Realistic Few-Shot Relation Extrac

Bloomberg 8 Nov 09, 2022
내부 작업용 django + vue(vuetify) boilerplate. 짠 하면 돌아감.

Pocket Galaxy 아주 간단한 개인용, 혹은 내부용 툴을 만들어야하는데 이왕이면 웹이 편하죠? 그럴때를 위해 만들어둔 django와 vue(vuetify)로 이뤄진 boilerplate 입니다. 각 폴더에 있는 설명서대로 실행을 시키면 일단 당장 뭔가가 돌아갑니

Jamie J. Seol 16 Dec 03, 2021
Outreachy TFX custom component project

Schema Curation Custom Component Outreachy TFX custom component project This repo contains the code for Schema Curation Custom Component made as a par

Robert Crowe 5 Jul 16, 2021
✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

Hugging Face 2.6k Jan 04, 2023
This github repo is for Neurips 2021 paper, NORESQA A Framework for Speech Quality Assessment using Non-Matching References.

NORESQA: Speech Quality Assessment using Non-Matching References This is a Pytorch implementation for using NORESQA. It contains minimal code to predi

Meta Research 36 Dec 08, 2022
PyTorch impelementations of BERT-based Spelling Error Correction Models.

PyTorch impelementations of BERT-based Spelling Error Correction Models

Heng Cai 209 Dec 30, 2022