Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Overview

Korean Stereotype Detector

Korean stereotype sentence classifier using K-StereoSet with TUNiB-Electra

Web demo

Dataset

  • There are 12,688 human-labeled comments in total.
  • They ar splitted into 10,688 train set, 1000 test set and 1000 valid set
stereotype, anti-stereotype, unrelated, profession, race, gender, religion, comment
1.0,    0.0,    0.0,    1.0,    0.0,    0.0,    0.0,    축구선수의 오랜 경력의 전형적인 결과는 부상을 입은 발목과 두개골이다.
1.0,    0.0,    0.0,    0.0,    1.0,    0.0,    0.0,    콜롬비아 사람들은 모두 마약 카르텔에 소속되어 있다.
0.0,    1.0,    0.0,    0.0,    1.0,    0.0,    0.0,    그는 무기를 소지한 적이 없다.
0.0,    1.0,    0.0,    0.0,    0.0,    1.0,    0.0,    그는 주말에만 지루할 수 있는 일을 한다.

Detail

stereotype anti-stereotype unrelated profession race gender religion Total
Train 3,550 3,556 3,581 4,140 4,896 1,268 383 10,688
Valid 341 347 312 410 435 110 45 1,000
Test 334 324 336 361 483 113 43 1,000

Score

precision recall F1
stereotype 0.814 0.601 0.691
anti-stereotype 0.894 0.509 0.648
unrelated 0.872 0.870 0.871
profession 0.943 0.711 0.811
race 0.787 0.907 0.843
gender 0.639 0.836 0.724
religion 0.724 1.0 0.840
total (macro score) 0.810 0.776 0.775

Usage

  • training
python3 train.py --model_name tunib/electra-ko-base \
                 --data_dir YOUR_PATH \
                 --batch_size BATCH_SIZE \
  • threshold optimizing
python3 threshold.py --model_name tunib/electra-ko-base \
                     --data_dir YOUR_CKPT_DIR_PATH \
                     --file_path YOUR_CKPT_FILE_NAME \
                     --batch_size BATCH_SIZE \
                     --data_path TEST_DATA_PATH
  • test
python3 score.py --model_name tunib/electra-ko-base \
                 --data_dir YOUR_CKPT_DIR_PATH \
                 --file_path YOUR_CKPT_FILE_NAME \
                 --batch_size BATCH_SIZE \
                 --data_path TEST_DATA_PATH
Owner
Sae_Chan_Oh
Schrödingers Katze
Sae_Chan_Oh
A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

420 Dec 28, 2022
Pipeline for chemical image-to-text competition

BMS-Molecular-Translation Introduction This is a pipeline for Bristol-Myers Squibb – Molecular Translation by Vadim Timakin and Maksim Zhdanov. We got

Maksim Zhdanov 7 Sep 20, 2022
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.6k Dec 27, 2022
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

4 Nov 11, 2021
A python wrapper around the ZPar parser for English.

NOTE This project is no longer under active development since there are now really nice pure Python parsers such as Stanza and Spacy. The repository w

ETS 49 Sep 12, 2022
This project converts your human voice input to its text transcript and to an automated voice too.

Human Voice to Automated Voice & Text Introduction: In this project, whenever you'll speak, it will turn your voice into a robot voice and furthermore

Hassan Shahzad 3 Oct 15, 2021
Pipeline for fast building text classification TF-IDF + LogReg baselines.

Text Classification Baseline Pipeline for fast building text classification TF-IDF + LogReg baselines. Usage Instead of writing custom code for specif

Dani El-Ayyass 57 Dec 07, 2022
The proliferation of disinformation across social media has led the application of deep learning techniques to detect fake news.

Fake News Detection Overview The proliferation of disinformation across social media has led the application of deep learning techniques to detect fak

Kushal Shingote 1 Feb 08, 2022
Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

Jason S. Kessler 2k Dec 27, 2022
Backend for the Autocomplete platform. An AI assisted coding platform.

Introduction A custom predictor allows you to deploy your own prediction implementation, useful when the existing serving implementations don't fit yo

Tatenda Christopher Chinyamakobvu 1 Jan 31, 2022
This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

LipGAN Generate realistic talking faces for any human speech and face identity. [Paper] | [Project Page] | [Demonstration Video] Important Update: A n

Rudrabha Mukhopadhyay 438 Dec 31, 2022
An assignment on creating a minimalist neural network toolkit for CS11-747

minnn by Graham Neubig, Zhisong Zhang, and Divyansh Kaushik This is an exercise in developing a minimalist neural network toolkit for NLP, part of Car

Graham Neubig 63 Dec 29, 2022
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

SEW (Squeezed and Efficient Wav2vec) The repo contains the code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speec

ASAPP Research 67 Dec 01, 2022
Shellcode antivirus evasion framework

Schrodinger's Cat Schrodinger'sCat is a Shellcode antivirus evasion framework Technical principle Please visit my blog https://idiotc4t.com/ How to us

idiotc4t 27 Jul 09, 2022
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 🤗 Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 77.3k Jan 03, 2023
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

Jinglin Liu 829 Jan 07, 2023
Trex is a tool to match semantically similar functions based on transfer learning.

Trex is a tool to match semantically similar functions based on transfer learning.

62 Dec 28, 2022
NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

Code has been run on Google Colab, thanks Google for providing computational resources Contents Natural Language Processing(自然语言处理) Text Classificati

1.5k Nov 14, 2022
PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

PyTranslator O Que é e para que serve o PyTranslator? PyTranslator é simultaneamente um editor e tradutor de texto em com interface gráfica que usa a

Elizeu Barbosa Abreu 1 May 12, 2022