Implementation of legal QA system based on SentenceKoBART

Last update: Dec 27, 2022

Related tags

Text Data & NLP LegalQA

Overview

LegalQA using SentenceKoBART

Implementation of legal QA system based on SentenceKoBART

How to train SentenceKoBART
Based on Neural Search Engine Jina
Provide Korean legal QA data(1,830 pairs)

Setup

# install git lfs , https://github.com/git-lfs/git-lfs/wiki/Installation
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
git clone https://github.com/haven-jeon/LegalQA.git
cd LegalQA
git lfs pull
pip install -r requirements.txt

Index

python app.py -t index

GPU-based indexing available as an option

pods/encoder.yml - on_gpu: true

Search

With REST API

To start the Jina server for REST API:

python app.py -t query_restful

Then use a client to query:

curl --request POST -d '{"top_k": 1, "mode": "search",  "data": ["상속 관련 문의"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:1234/api/search'

Or use Jinabox with endpoint http://127.0.0.1:1234/api/search

From the terminal

python app.py -t query

Demo

http://ec2-3-36-123-253.ap-northeast-2.compute.amazonaws.com:7874/

Citation

Model training, data crawling, and demo system were all supported by the AWS Hero program.

@misc{heewon2021,
author = {Heewon Jeon},
title = {LegalQA using SentenceKoBART},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/haven-jeon/LegalQA}}

License

QA data data/legalqa.jsonlines is crawled in www.freelawfirm.co.kr based on robots.txt. Commercial use other than academic use is prohibited.
We are not responsible for any legal decisions we make based on the resources provided here.

Implementation of legal QA system based on SentenceKoBART

Related tags

Overview

LegalQA using SentenceKoBART

Setup

Index

Search

With REST API

From the terminal

Demo

Citation

License

Owner

Heewon Jeon(gogamza)

AutoGluon: AutoML for Text, Image, and Tabular Data

Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Image2pcl - Enter the metaverse with 2D image to 3D projections

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

Checking spelling of form elements

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

The proliferation of disinformation across social media has led the application of deep learning techniques to detect fake news.

PyTorch original implementation of Cross-lingual Language Model Pretraining.

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

This repository describes our reproducible framework for assessing self-supervised representation learning from speech

Unofficial PyTorch implementation of Google AI's VoiceFilter system

초성 해석기 based on ko-BART

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

Linear programming solver for paper-reviewer matching and mind-matching

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

This is Assignment1 code for the Web Data Processing System.

Pipeline for chemical image-to-text competition

Binaural Speech Synthesis