For making Tagtog annotation into csv dataset

Last update: Dec 28, 2021

Overview

tagtog_relation_extraction

for making Tagtog annotation into csv dataset

How to Use

On Tagtog

1. Go to Project > Downloads
2. Download all documents, using the button below

On Local

1. Place folders and files according to the structure specified below:

tagtog_relation_extraction
├──main.py
├──util.py
├──.gitignore
├──README.md
├──requirements.txt
└──Your_download_file_Name
   ├──annotations-legend.json
   ├──ann.json
   |  └──master
   |     └──pool/
   ├──plain.html
   |  └──pool/
   ├──guidelines.md
   └──README.md

2. Install other required packages

tqdm==4.62.3
pandas==1.1.5
beautifulsoup4==4.10.0

$ pip install -r $ROOT/tagtog_relation_extraction/requirements.txt

3. Run

$ python main.py --path Your_download_file_Name

Result

1. Dataset file (dataset.csv)

csv file with rows in KLUE dataset format
example:

sentence: 가장 가능성이 높은 새 대안은 플랑크 상수를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 원자질량단위이다
sub_tag: {'word': '원자질량단위', 'start_idx': 85, 'end_idx': 90, 'type': 'POH'}
obj_tag: {'word': '플랑크 상수', 'start_idx': 17, 'end_idx': 22, 'type': 'POH'}
label: POH:no_relation'

2. File for checking answers (answer_check.csv)

csv file desgined for checking entity taggings and labels
example:

sentence: 가장 가능성이 높은 새 대안은 
   
    를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 
    
     이다	
sub_tag: POH
obj_tag: POH
label: POH:no_relation

Restrictions

Entity labels should follow the following form

SUBJ-{ENT_TYPE}-{RELATION_NAME}
OBJ-{ENT_TYPE}-{RELATION_NAME}

If this is not the case you might need some revision on the util.py file

For making Tagtog annotation into csv dataset

Related tags

Overview

tagtog_relation_extraction

How to Use

On Tagtog

On Local

Result

Restrictions

Owner

hyeong

Hidden Markov Models in Python, with scikit-learn like API

Statistical Rethinking course winter 2022

A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

The lastest all in one bombing tool coded in python uses tbomb api

Geospatial data-science analysis on reasons behind delay in Grab ride-share services

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Tools for analyzing data collected with a custom unity-based VR for insects.

vartests is a Python library to perform some statistic tests to evaluate Value at Risk (VaR) Models

Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

peptides.py is a pure-Python package to compute common descriptors for protein sequences

small package with utility functions for analyzing (fly) calcium imaging data

A set of procedures that can realize covid19 virus detection based on blood.

Meltano: ELT for the DataOps era. Meltano is open source, self-hosted, CLI-first, debuggable, and extensible.

yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data.

💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

PyPDC is a Python package for calculating asymptotic Partial Directed Coherence estimations for brain connectivity analysis.

Retentioneering: product analytics, data-driven customer journey map optimization, marketing analytics, web analytics, transaction analytics, graph visualization, and behavioral segmentation with customer segments in Python.

Python for Data Analysis, 2nd Edition

Validation and inference over LinkML instance data using souffle

A data parser for the internal syncing data format used by Fog of World.