This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

Last update: Mar 11, 2022

Related tags

Overview

NLP Classifier

Introduction

This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using batch_inference.py. This architecture can be easily extended to cover a lot more models.

Installation

Set up

$ https://github.com/abdullahtarek/nlp_classifier.git
$ cd nlp_classifier.git
Move the train.csv and test.csv in the data folder

Python

$ pip install -r requirements.txt
$ Copy the training or testing dataset in the "data" folder
$ python training.py or $ python batch_inference.py

Docker

$ docker build . -t nlp_classifier
$ docker run -it -v $DATA_FOLDER:/app/data -v $LOCAL_SAVED_MODEL_FOLDER:/app/saved_models nlp_classifier python batch_inference.py or python training.py

Extra options

Manging Configurations

All configurations are in the conf folder where you can change the data path, model path, etc.
You can also provide the configuration flag while running the script. You can write --help after the python command to see which configs you can change. Example: python3 batch_inference.py --help.

This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

Related tags

Overview

NLP Classifier

Introduction

Installation

Set up

Python

Docker

Extra options

Manging Configurations

Owner

Abdullah Tarek

Train BPE with fastBPE, and load to Huggingface Tokenizer.

Topic Inference with Zeroshot models

Conditional probing: measuring usable information beyond a baseline

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Uncomplete archive of files from the European Nopsled Team

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

Python SDK for working with Voicegain Speech-to-Text

Transformer training code for sequential tasks

TensorFlow code and pre-trained models for BERT

txtai: Build AI-powered semantic search applications in Go

Code for the paper "Flexible Generation of Natural Language Deductions"

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

SGMC: Spectral Graph Matrix Completion

Pretty-doc - Composable text objects with python

Outreachy TFX custom component project

Fixes mojibake and other glitches in Unicode text, after the fact.

hashily is a Python module that provides a variety of text decoding and encoding operations.

jiant is an NLP toolkit

Scene Text Retrieval via Joint Text Detection and Similarity Learning