APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Last update: Dec 06, 2022

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of the dataset are created by anonymous participants using an online crowdsourcing platform DeepNatural AI.

Sample Code :

Download

You can download benchmark set APEACH. APEACH/test.csv in this repository.

Dataset Description

APEACH : A hate-speech evaluation dataset generated in 2021, using generation method followd by APEACH paper.

Guidelines

APEACH-GUIDELINE

Topics

Lengths

Paper

https://arxiv.org/pdf/2202.12459.pdf

Experiment Code

Experiment Results

Name	Beep! Dev Dataset	Apeach (Ours)
SoongsilBERT-Base	0.8261	0.8424
SoongsilBERT-Small	0.8149	0.8228
KcBERT-base	0.8088	0.8086
KcBERT-large	0.8295	0.8116
DistillKoBERT	0.7570	0.7715
KoELECTRA-V3	0.7920	0.8101
KoBERT	0.8030	0.7885

We also share BEST model of our dataset which we trained in this experiment as checkpoint, demo webite and api.

Citation

@article{yang2022apeach,
  title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets},
  author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik},
  journal={arXiv preprint arXiv:2202.12459},
  year={2022}
}

Contributors

The main contributors of the work ( * : equal contribution) :

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

Download

Dataset Description

Guidelines

Topics

Lengths

Paper

Experiment Code

Experiment Results

Citation

Contributors

License

Owner

Kevin-Yang

Mlcode - Continuous ML API Integrations

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。

Code for "Generative adversarial networks for reconstructing natural images from brain activity".

Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library.

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

Py65 65816 - Add support for the 65C816 to py65

NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Calibre recipe to convert latest issue of Analyse & Kritik into an ebook

自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器

DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

Adversarial Examples for Extreme Multilabel Text Classification

내부 작업용 django + vue(vuetify) boilerplate. 짠 하면 돌아감.

构建一个多源（公众号、RSS）、干净、个性化的阅读环境

CoSENT、STS、SentenceBERT

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

Sequence modeling benchmarks and temporal convolutional networks

Leon is an open-source personal assistant who can live on your server.

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

Download

Dataset Description

Guidelines

Topics

Lengths

Paper

Experiment Code

Experiment Results

Citation

Contributors

License

Owner

Kevin-Yang

Mlcode - Continuous ML API Integrations

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含 自然语言处理各领域的 面试题积累。

Code for "Generative adversarial networks for reconstructing natural images from brain activity".

Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library.

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

Py65 65816 - Add support for the 65C816 to py65

NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Calibre recipe to convert latest issue of Analyse & Kritik into an ebook

自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器

DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

Adversarial Examples for Extreme Multilabel Text Classification

내부 작업용 django + vue(vuetify) boilerplate. 짠 하면 돌아감.

构建一个多源（公众号、RSS）、干净、个性化的阅读环境

CoSENT、STS、SentenceBERT

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

Sequence modeling benchmarks and temporal convolutional networks

Leon is an open-source personal assistant who can live on your server.

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。