Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

Last update: Sep 17, 2022

Overview

XLM-EMO: Multilingual Emotion Prediction in Social Media Text

Abstract

Detecting emotion in text allows social and computational scientists to study how people behave and react to online events. However, developing these tools for different languages requires data that is not always available. This paper collects the available emotion detection datasets across 19 languages. We train a multilingual emotion prediction model for social media data, XLM-EMO. The model shows competitive performance in a zero-shot setting, suggesting it is helpful in the context of low-resource languages. We release our model to the community so that interested researchers can directly use it.

See the paper for additional details:

Bianchi, F., Nozza, & D., Hovy. "XLM-EMO: Multilingual Emotion Prediction in Social Media Text". In Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (Forthcoming). Association for Computational Linguistics, 2022. Link.

Free software: MIT license

Installing

pip install -U xlm-emo

Important: If you want to use CUDA you need to install the correct version of the CUDA systems that matches your distribution, see PyTorch.

Features

from xlm_emo.classifier import  EmotionClassifier
ec = EmotionClassifier()

ec.predict(["senti testa di cazzo", "I am very happy"])

>> ["anger", "joy"]

Models

Model	Link	Macro F1 on Test Set
XLM-EMO-T	https://huggingface.co/MilaNLProc/xlm-emo-t	0.85
XLM-EMO-B	TBD	TBD
XLM-EMO-L	TBD	TBD

Reference

If you use this tool please cite the following paper:

@inproceedings{bianchi-etal-2022-xlmemo,
title = {{XLM-EMO}: Multilingual Emotion Prediction in Social Media Text},
author = "Bianchi, Federico and Nozza, Debora and Hovy, Dirk",
booktitle = "Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis",
year = "2022",
publisher = "Association for Computational Linguistics"
}

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

Related tags

Overview

XLM-EMO: Multilingual Emotion Prediction in Social Media Text

Abstract

Installing

Features

Models

Reference

Credits

Owner

MilaNLP

List of GSoC organisations with number of times they have been selected.

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

Collection of scripts to pinpoint obfuscated code

YACLC - Yet Another Chinese Learner Corpus

Open Source Neural Machine Translation in PyTorch

Model for recasing and repunctuating ASR transcripts

Reading Wikipedia to Answer Open-Domain Questions

Contains descriptions and code of the mini-projects developed in various programming languages

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。

BookNLP, a natural language processing pipeline for books

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Python package for performing Entity and Text Matching using Deep Learning.

Every Google, Azure & IBM text to speech voice for free

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

Related tags

Overview

XLM-EMO: Multilingual Emotion Prediction in Social Media Text

Abstract

Installing

Features

Models

Reference

Credits

Owner

MilaNLP

List of GSoC organisations with number of times they have been selected.

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

Collection of scripts to pinpoint obfuscated code

YACLC - Yet Another Chinese Learner Corpus

Open Source Neural Machine Translation in PyTorch

Model for recasing and repunctuating ASR transcripts

Reading Wikipedia to Answer Open-Domain Questions

Contains descriptions and code of the mini-projects developed in various programming languages

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含 自然语言处理各领域的 面试题积累。

BookNLP, a natural language processing pipeline for books

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Python package for performing Entity and Text Matching using Deep Learning.

Every Google, Azure & IBM text to speech voice for free

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。