초성 해석기 based on ko-BART

Last update: Oct 28, 2022

Related tags

Overview

초성 해석기

개요

한국어 초성만으로 이루어진 문장을 입력하면, 완성된 문장을 예측하는 초성 해석기입니다.

초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ
예측 문장: 나는 너를 좋아해

모델

모델은 SKT-AI에서 공개한 Ko-BART를 이용합니다.

데이터

문장 단위로 이루어진 아무 코퍼스나 사용가능합니다. 단, 모델의 추론 성능은 데이터의 도메인이나 데이터의 양에 크게 의존하기 때문에 원하는 모델 성능에 맞는 코퍼스를 사용해주세요. ./data 디렉토리에 더미 데이터셋을 추가해두었으니, 더미 데이터셋과 동일한 형식의 코퍼스를 준비해두시면 됩니다.

학습

python run_train.py

추론

python run_inference.py --finetuned-model-path $FINETUNED_MODEL_PATH

예시

공개된 코퍼스로 학습한 모델의 추론 결과입니다.

초성: ㅂㄱㅍㄷ 	 예측 문장: 배고픈데
초성: ㅂㄱㅍㄷ 	 예측 문장: 배고프다
초성: ㅂㄱㅍㄷ 	 예측 문장: 배고프대

초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 너무너무 사랑해요
초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 너무너무 사랑했어
초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 나만너무 사랑해요

초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 나는 너를 좋아해
초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 누나 나랑 좋아해
초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 너는 나를 좋아해

Notes

본 레포는 별도의 학습 데이터를 포함하고 있지 않습니다.
본 레포의 라이센스는 Ko-BART의 modified-MIT 라이센스를 따릅니다.

Todo

테스트 코드 추가

초성 해석기 based on ko-BART

Related tags

Overview

초성 해석기

개요

모델

데이터

학습

추론

예시

Notes

Todo

Owner

Dawoon Jung

CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training

Korean Sentence Embedding Repository

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. X-Ray supports 18 languages.

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Tool to check whether a GCP bucket is public or not.

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Code for text augmentation method leveraging large-scale language models

Python functions for summarizing and improving voice dictation input.

Implementation of Fast Transformer in Pytorch

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

The official implementation of "BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?, ACL 2021 main conference"

NLP applications using deep learning.

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

Training code for Korean multi-class sentiment analysis

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.