EmoBERT-MLOps

The goal of this repository is to build an end-to-end MLOps pipeline based on the MLOps course from Made with ML, but this project have some differences on design, tools and frameworks used, with the objective to practice and give a different angle and implementation to the original course.

This project uses a BERT model for emotion classification and is based on the GoEmotions dataset.

Content list

TODO

Dataset descrition

Taken from https://ai.googleblog.com/2021/10/goemotions-dataset-for-fine-grained.html

In “GoEmotions: A Dataset of Fine-Grained Emotions”, we describe GoEmotions, a human-annotated dataset of 58k Reddit comments extracted from popular English-language subreddits and labeled with 27 emotion categories. As the largest fully annotated English language fine-grained emotion dataset to date, we designed the GoEmotions taxonomy with both psychology and data applicability in mind. In contrast to the basic six emotions, which include only one positive emotion (joy), our taxonomy includes 12 positive, 11 negative, 4 ambiguous emotion categories and 1 “neutral”, making it widely suitable for conversation understanding tasks that require a subtle differentiation between emotion expressions.

Model descrition

TODO

End-to-end MLOps pipeline of a BERT model for emotion classification.

Related tags

Overview

EmoBERT-MLOps

Content list

Dataset descrition

Model descrition

Owner

Dimitre Oliveira

Google's Meena transformer chatbot implementation

Train BPE with fastBPE, and load to Huggingface Tokenizer.

code for modular summarization work published in ACL2021 by Krishna et al

Code for the paper PermuteFormer

A complete NLP guideline for enthusiasts

Translation to python of Chris Sims' optimization function

To be a next-generation DL-based phenotype prediction from genome mutations.

Textlesslib - Library for Textless Spoken Language Processing

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

NLP library designed for reproducible experimentation management

Japanese synonym library

Py65 65816 - Add support for the 65C816 to py65

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

[ICLR'19] Trellis Networks for Sequence Modeling

Predict an emoji that is associated with a text

Levenshtein and Hamming distance computation

Paddle2.x version AI-Writer

The training code for the 4th place model at MDX 2021 leaderboard A.

A paper list for aspect based sentiment analysis.

Code for text augmentation method leveraging large-scale language models