Machine learning models from Singapore's NLP research community

Related tags

Text Data & NLPsgnlp
Overview

SG-NLP

Machine learning models from Singapore's natural language processing (NLP) research community.

sgnlp is a Python package that allows you to easily get started on using various (NLP) models implemented using the Pytorch and Transfromers frameworks.

We have an accompanying demo site where you can interact with our models and get a better understanding on how they work.

Installation

  • Python >= 3.8
pip install sgnlp

Documentation

Visit our documentation for tutorials.

License

Code and models from this project are released under the MIT License unless otherwise stated. If a model's code is under a separate license, it can be found in the respective model's folder.

Comments
  • Change demo api to use gevent worker

    Change demo api to use gevent worker

    • Using multiple workers of the default type 'sync' in gunicorn is not working on Kubernetes
    • Workers constantly terminated due to signal 9
    • Try gevent to see if it works out
    opened by jonheng 2
  • UFD use case tutorial and usability improvement

    UFD use case tutorial and usability improvement

    • Added additional tutorial on how to use UFD to train and evaluate on custom dataset
    • Bug fix for UFD parse_args_and_load_config util function
    • Added feature to create folder if folder doesn't exist
    • Added some train args param in eval args param to improve usability
    • Made caching optional
    • Added validation to make debugging easier
    • Added links to config file examples for reccon models
    opened by vincenttzc 1
  • Wrong assert comparison for SenticGCN dataclass

    Wrong assert comparison for SenticGCN dataclass

    Latest SenticGCN implementation for the Dev branch. In the dataclass.py, post_init method in SenticGCNTrainArgs, there are the following assertions,

    assert self.repeats > 1, "Repeats value must be at least 1."
    assert self.patience > 1, "Patience value must be at least 1." 
    

    The comparison operator should be >= instead.

    bug 
    opened by raymondng76 0
  • 47 centralized logging

    47 centralized logging

    • Create a centralized logger for 'sgnlp' base logger
    • 'sgnlp' logger is created from a config json and is init a the 'sgnlp' module init.py
    • Replace all logging method call with their own script specific logger
    opened by raymondng76 0
  • Add parent class for preprocessor

    Add parent class for preprocessor

    • [x] Create a module named sgnlp.base
    • [x] Add abstractmethods for preprocess, save, load
    • [x] Add batch iteration to parent __call__
    • [x] Parent __call__ should return a dictionary
    enhancement 
    opened by jonheng 0
  • 46 senticgcn bugfix

    46 senticgcn bugfix

    • Add multi-word aspect support
    • Update documentation to reflect multi-word support
    • Update unit tests
    • Update usage example to include multi-word support
    opened by raymondng76 0
  • Fix multi-word aspect issue with Sentic-GCN preprocessor

    Fix multi-word aspect issue with Sentic-GCN preprocessor

    The current implementation of preprocessor matches a single aspect index for the purpose of matching postprocessor output. The aspect index field for process_input payload should be expended to handle aspects with multiple indexes.

    bug 
    opened by raymondng76 0
  • Add Sentic-GCN demo_api to SGNlp

    Add Sentic-GCN demo_api to SGNlp

    Close #43

    This pull request is to add Sentic-GCN demo_api models to sgnlp. Includes the follow components:

    • model_card
    • api.py
    • dockerfiles
    • requirements.txt
    • usage.py
    opened by K-WeiMing 0
  • Add Sentic-GCN to SGNlp

    Add Sentic-GCN to SGNlp

    close #41

    This pull request is to add Sentic-GCN models to sgnlp. Includes the follow components:

    • Models
    • Configs
    • Tokenizers
    • Embedding models
    • Trainer/Evaluator
    • Unit test
    • documentation

    Does not include demo_api as it is covered in another issue tickets.

    opened by raymondng76 0
  • download_pretrained for demo API does not cache downloaded files/models

    download_pretrained for demo API does not cache downloaded files/models

    To allow the containers to start up quicker, models and files were downloaded and cached during build time.

    Recent changes in the huggingface transformers package has broken this functionality:

    • Released in v4.22.0
    • Issue

    Possible choices moving forward:

    • Write a simple caching utility function
    • Stick to versions of transformers before 4.22.0
    opened by jonheng 0
  • Add Stance Detection model

    Add Stance Detection model

    opened by atenzer 0
Releases(v0.4.0)
Owner
AI Singapore | AI Makerspace
Grow local AI talents and empowering start-ups, SMEs and enterprises with AI components, frameworks, platforms and advisory services.
AI Singapore | AI Makerspace
Use Google's BERT for named entity recognition (CoNLL-2003 as the dataset).

For better performance, you can try NLPGNN, see NLPGNN for more details. BERT-NER Version 2 Use Google's BERT for named entity recognition (CoNLL-2003

Kaiyinzhou 1.2k Dec 26, 2022
Use fastai-v2 with HuggingFace's pretrained transformers

FastHugs Use fastai v2 with HuggingFace's pretrained transformers, see the notebooks below depending on your task: Text classification: fasthugs_seq_c

Morgan McGuire 111 Nov 16, 2022
Use Tensorflow2.7.0 Build OpenAI'GPT-2

TF2_GPT-2 Use Tensorflow2.7.0 Build OpenAI'GPT-2 使用最新tensorflow2.7.0构建openai官方的GPT-2 NLP模型 优点 使用无监督技术 拥有大量词汇量 可实现续写(堪比“xx梦续写”) 实现对话后续将应用于FloatTech的Bot

Watermelon 9 Sep 13, 2022
The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Speech Separation The simple project to separate mixed voice (2 clean voices) to 2 separate voices. Result Example (Clisk to hear the voices): mix ||

vuthede 31 Oct 30, 2022
The official repository of the ISBI 2022 KNIGHT Challenge

KNIGHT The official repository holding the data for the ISBI 2022 KNIGHT Challenge About The KNIGHT Challenge asks teams to develop models to classify

Nicholas Heller 4 Jan 22, 2022
It analyze the sentiment of the user, whether it is postive or negative.

Sentiment-Analyzer-Tool It analyze the sentiment of the user, whether it is postive or negative. It uses streamlit library for creating this sentiment

Paras Patidar 18 Dec 17, 2022
In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Making Emojis More Predictable by Karan Abrol, Karanjot Singh and Pritish Wadhwa, Natural Language Processing (CSE546) under the guidance of Dr. Shad

Karanjot Singh 2 Jan 17, 2022
Kerberoast with ACL abuse capabilities

targetedKerberoast targetedKerberoast is a Python script that can, like many others (e.g. GetUserSPNs.py), print "kerberoast" hashes for user accounts

Shutdown 213 Dec 22, 2022
Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library.

GI-Pi Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library. The SP0

Nick Bild 8 Dec 15, 2021
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Dec 31, 2022
Poetry PEP 517 Build Backend & Core Utilities

Poetry Core A PEP 517 build backend implementation developed for Poetry. This project is intended to be a light weight, fully compliant, self-containe

Poetry 293 Jan 02, 2023
The Easy-to-use Dialogue Response Selection Toolkit for Researchers

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

GMFTBY 32 Nov 13, 2022
PIZZA - a task-oriented semantic parsing dataset

The PIZZA dataset continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents.

17 Dec 14, 2022
Legal text retrieval for python

legal-text-retrieval Overview This system contains 2 steps: generate training data containing negative sample found by mixture score of cosine(tfidf)

Nguyễn Minh Phương 22 Dec 06, 2022
Large-scale pretraining for dialogue

A State-of-the-Art Large-scale Pretrained Response Generation Model (DialoGPT) This repository contains the source code and trained model for a large-

Microsoft 1.8k Jan 07, 2023
🕹 An esoteric language designed so that the program looks like the transcript of a Pokémon battle

PokéBattle is an esoteric language designed so that the program looks like the transcript of a Pokémon battle. Original inspiration and specification

Eduardo Correia 9 Jan 11, 2022
State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Trapper (Transformers wRAPPER) Trapper is an NLP library that aims to make it easier to train transformer based models on downstream tasks. It wraps h

Open Business Software Solutions 42 Sep 21, 2022
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Introduction Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduc

GUOKUN LAI 197 Dec 11, 2022
GooAQ 🥑 : Google Answers to Google Questions!

This repository contains the code/data accompanying our recent work on long-form question answering.

AI2 112 Nov 06, 2022
Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Weitang Liu 1.6k Jan 03, 2023