Repository for Project Insight: NLP as a Service

Last update: Dec 06, 2022

Overview

Project Insight

NLP as a Service

Introduction
- Features
Installation
- Setup and Documentation
Project Details
License

Introduction

Project Insight is designed to create NLP as a service with code base for both front end GUI (streamlit) and backend server (FastApi) the usage of transformers models on various downstream NLP task.

The downstream NLP tasks covered:

News Classification
Entity Recognition
Sentiment Analysis
Summarization
Information Extraction To Do

The user can select different models from the drop down to run the inference.

The users can also directly use the backend fastapi server to have a command line inference.

Features of the solution

Python Code Base: Built using Fastapi and Streamlit making the complete code base in Python.
Expandable: The backend is desinged in a way that it can be expanded with more Transformer based models and it will be available in the front end app automatically.
Micro-Services: The backend is designed with a microservices architecture, with dockerfile for each service and leveraging on Nginx as a reverse proxy to each independently running service.
- This makes it easy to update, manitain, start, stop individual NLP services.

Installation

Clone the Repo.
Run the Docker Compose to spin up the Fastapi based backend service.
Run the Streamlit app with the streamlit run command.

Setup and Documentation

Download the models
- Download the models from here
- Save them in the specific model folders inside the src_fastapi folder.
Running the backend service.
- Go to the src_fastapi folder
- Run the Docker Compose comnand
```
$ cd src_fastapi
src_fastapi:~$ sudo docker-compose up -d
```
Running the frontend app.
- Go to the src_streamlit folder
- Run the app with the streamlit run command
```
$ cd src_streamlit
src_streamlit:~$ streamlit run NLPfily.py
```
Access to Fastapi Documentation: Since this is a microservice based design, every NLP task has its own seperate documentation
- News Classification: http://localhost:8080/api/v1/classification/docs
- Sentiment Analysis: http://localhost:8080/api/v1/sentiment/docs
- NER: http://localhost:8080/api/v1/ner/docs
- Summarization: http://localhost:8080/api/v1/summary/docs

Project Details

Demonstration

Directory Details

Front End: Front end code is in the src_streamlit folder. Along with the Dockerfile and requirements.txt
Back End: Back End code is in the src_fastapi folder.
- This folder contains directory for each task: Classification, ner, summary...etc
- Each NLP task has been implemented as a microservice, with its own fastapi server and requirements and Dockerfile so that they can be independently mantained and managed.
- Each NLP task has its own folder and within each folder each trained model has 1 folder each. For example:
```
- sentiment
    > app
        > api
            > distilbert
                - model.bin
                - network.py
                - tokeniser files
            >roberta
                - model.bin
                - network.py
                - tokeniser files
```
- For each new model under each service a new folder will have to be added.
- Each folder model will need the following files:
  - Model bin file.
  - Tokenizer files
  - network.py Defining the class of the model if customised model used.
- config.json: This file contains the details of the models in the backend and the dataset they are trained on.

How to Add a new Model

Fine Tune a transformer model for specific task. You can leverage the transformers-tutorials
Save the model files, tokenizer files and also create a network.py script if using a customized training network.
Create a directory within the NLP task with directory_name as the model name and save all the files in this directory.
Update the config.json with the model details and dataset details.

Update the <service>pro.py with the correct imports and conditions where the model is imported. For example for a new Bert model in Classification Task, do the following:

Create a new directory in classification/app/api/. Directory name bert.

Update config.json with following:

"classification": {
"model-1": {
    "name": "DistilBERT",
    "info": "This model is trained on News Aggregator Dataset from UC Irvin Machine Learning Repository. The news headlines are classified into 4 categories: **Business**, **Science and Technology**, **Entertainment**, **Health**. [New Dataset](https://archive.ics.uci.edu/ml/datasets/News+Aggregator)"
},
"model-2": {
    "name": "BERT",
    "info": "Model Info"
}
}

Update classificationpro.py with the following snippets:

Only if customized class used

from classification.bert import BertClass

Section where the model is selected

if model == "bert":
    self.model = BertClass()
    self.tokenizer = BertTokenizerFast.from_pretrained(self.path)

License

This project is licensed under the GPL-3.0 License - see the LICENSE.md file for details

Repository for Project Insight: NLP as a Service

Related tags

Overview

Project Insight

NLP as a Service

Contents

Introduction

Features of the solution

Installation

Setup and Documentation

Project Details

Demonstration

Directory Details

How to Add a new Model

License

Owner

Abhishek Kumar Mishra

Mysticbbs-rjam - rJAM splitscreen message reader for MysticBBS A46+

Chatbot for the Chatango messaging platform

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

NL. The natural language programming language.

Let Xiao Ai speakers control third-party devices

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

A multi-voice TTS system trained with an emphasis on quality

Experiments in converting wikidata to ftm

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

Natural Language Processing at EDHEC, 2022

A natural language processing model for sequential sentence classification in medical abstracts.

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Signature remover is a NLP based solution which removes email signatures from the rest of the text.

Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

Korean Sentence Embedding Repository

🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

Conversational-AI-ChatBot - Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users!

Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

Main repository for the chatbot Bobotinho.