KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

Last update: Dec 13, 2022

Related tags

Overview

KLUE Baseline

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark. See our paper for more details about KLUE and the baselines.

Dependencies

Make sure you have installed the packages listed in requirements.txt.

pip install -r requirements.txt

All expereiments are tested under Python 3.7 environment.

KLUE Benchmark Datasets

All train/dev sets of KLUE tasks are publicly available in this repo. You can access them by using git submodules. To clone the repo with datasets:

git clone --recursive https://github.com/KLUE-benchmark/KLUE-Baseline.git

or just download datasets after cloned this repo:

git submodule update --init --recursive

All test sets are not publicly available. To measure performance of your model on test set, you should first train your model on train set and submit the model to our submission system. Alternatively, you can compare dev set performances with our baseline models. They are also reported in our paper.

Train

To reproduce our baselines, run run_all.sh.

NOTE: klue/roberta models accept input length at most 510 tokens. Details are explained here.

Reference

If you use this code or KLUE, please cite:

@misc{park2021klue,
      title={KLUE: Korean Language Understanding Evaluation}, 
      author={Sungjoon Park and Jihyung Moon and Sungdong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Taehwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Younghoon Jeong and Inkwon Lee and Sangwoo Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and Sunkyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice Oh and Jung-Woo Ha and Kyunghyun Cho},
      year={2021},
      eprint={2105.09680},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contribution

Feel free to leave issues if there are any questions or comments. To contribute, please run make style before creating pull requests.

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

Related tags

Overview

KLUE Baseline

Dependencies

KLUE Benchmark Datasets

Train

Reference

Contribution

Owner

Transformers Wav2Vec2 + Parlance's CTCDecodeTransformers Wav2Vec2 + Parlance's CTCDecode

MEDIALpy: MEDIcal Abbreviations Lookup in Python

FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

This repo stores the codes for topic modeling on palliative care journals.

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

NLP codes implemented with Pytorch (w/o library such as huggingface)

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Train and use generative text models in a few lines of code.

Pytorch version of BERT-whitening

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

SciBERT is a BERT model trained on scientific text.

ACL'22: Structured Pruning Learns Compact and Accurate Models

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models

API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

DataCLUE: 国内首个以数据为中心的AI测评（含模型分析报告）

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

An open collection of annotated voices in Japanese language