TFIDF-based QA system for AIO2 competition

Last update: Feb 19, 2022

Related tags

Overview

AIO2 TF-IDF Baseline

This is a very simple question answering system, which is developed as a lightweight baseline for AIO2 competition.

In the training stage, the model builds a sparse matrix of TF-IDF features from the questions in training dataset. In the inference stage, the model predicts answers of unseen questions by finding the most similar training question to the input by computing dot product scores of TF-IDF features.

Therefore, in principle, the model cannot predict answers unseen in the training data.

Steps to experiment with the model

Install requirements

$ pip install -r requirements.txt

Train

$ python train.py \
--train_file <data dir>/aio_02_train.jsonl \
--output_dir model \
--pos_list 名詞 \
--stop_words でしょ う \
--max_features 10000

Predict

$ python predict.py \
--model_dir model \
--test_file <data dir>/aio_02_dev_unlabeled_v1.0.jsonl \
--prediction_file <output dir>/predictions.jsonl

Building Docker image

$ docker build -t aio2-tfidf-baseline .

Test locally:

:/app/input" -v ":/app/output" aio2-tfidf-baseline bash ./submission.sh input/aio_02_dev_unlabeled_v1.0.jsonl output/predictions.jsonl "> $ docker run --rm -v ":/app/input" -v ":/app/output" aio2-tfidf-baseline bash ./submission.sh input/aio_02_dev_unlabeled_v1.0.jsonl output/predictions.jsonl 

Save the docker image to file:

$ docker save aio2-tfidf-baseline | gzip > aio2-tfidf-baseline.tar.gz

License

The codes in this repository are open-sourced under MIT License.

TFIDF-based QA system for AIO2 competition

Related tags

Overview

AIO2 TF-IDF Baseline

Steps to experiment with the model

Install requirements

Train

Predict

Building Docker image

License

Owner

Masatoshi Suzuki

Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

The official repository of the ISBI 2022 KNIGHT Challenge

Speech Recognition Database Management with python

Harvis is designed to automate your C2 Infrastructure.

Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.

Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"

用Resnet101+GPT搭建一个玩王者荣耀的AI

The aim of this task is to predict someone's English proficiency based on a text input.

TalkNet: Audio-visual active speaker detection Model

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

Ecommerce product title recognition package

Jarvis is a simple Chatbot with a GUI capable of chatting and retrieving information and daily news from the internet for it's user.

Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Fixes mojibake and other glitches in Unicode text, after the fact.

Extract Keywords from sentence or Replace keywords in sentences.

Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

End-2-end speech synthesis with recurrent neural networks

TruthfulQA: Measuring How Models Imitate Human Falsehoods

Score-Based Point Cloud Denoising (ICCV'21)