Machine learning models from Singapore's NLP research community

Last update: Dec 17, 2022

Related tags

Overview

SG-NLP

Machine learning models from Singapore's natural language processing (NLP) research community.

sgnlp is a Python package that allows you to easily get started on using various (NLP) models implemented using the Pytorch and Transfromers frameworks.

We have an accompanying demo site where you can interact with our models and get a better understanding on how they work.

Installation

Python >= 3.8

pip install sgnlp

Documentation

Visit our documentation for tutorials.

License

Code and models from this project are released under the MIT License unless otherwise stated. If a model's code is under a separate license, it can be found in the respective model's folder.

Comments

Change demo api to use gevent worker
Using multiple workers of the default type 'sync' in gunicorn is not working on Kubernetes

Workers constantly terminated due to signal 9

Try gevent to see if it works out
opened by jonheng 2
UFD use case tutorial and usability improvement
Added additional tutorial on how to use UFD to train and evaluate on custom dataset

Bug fix for UFD parse_args_and_load_config util function

Added feature to create folder if folder doesn't exist

Added some train args param in eval args param to improve usability

Made caching optional

Added validation to make debugging easier

Added links to config file examples for reccon models
opened by vincenttzc 1
Wrong assert comparison for SenticGCN dataclass
Latest SenticGCN implementation for the Dev branch. In the dataclass.py, post_init method in SenticGCNTrainArgs, there are the following assertions,

assert self.repeats > 1, "Repeats value must be at least 1." assert self.patience > 1, "Patience value must be at least 1."

The comparison operator should be >= instead.
bug
opened by raymondng76 0
47 centralized logging
Create a centralized logger for 'sgnlp' base logger

'sgnlp' logger is created from a config json and is init a the 'sgnlp' module init.py

Replace all logging method call with their own script specific logger
opened by raymondng76 0
Add parent class for preprocessor
[x] Create a module named sgnlp.base

[x] Add abstractmethods for preprocess, save, load

[x] Add batch iteration to parent __call__

[x] Parent __call__ should return a dictionary

enhancement
opened by jonheng 0
46 senticgcn bugfix
Add multi-word aspect support

Update documentation to reflect multi-word support

Update unit tests

Update usage example to include multi-word support
opened by raymondng76 0
Fix multi-word aspect issue with Sentic-GCN preprocessor

The current implementation of preprocessor matches a single aspect index for the purpose of matching postprocessor output. The aspect index field for process_input payload should be expended to handle aspects with multiple indexes.
bug

opened by raymondng76 0
Add Sentic-GCN demo_api to SGNlp
Close #43

This pull request is to add Sentic-GCN demo_api models to sgnlp. Includes the follow components:

model_card

api.py

dockerfiles

requirements.txt

usage.py
opened by K-WeiMing 0
Add Sentic-GCN to SGNlp
close #41

This pull request is to add Sentic-GCN models to sgnlp. Includes the follow components:

Models

Configs

Tokenizers

Embedding models

Trainer/Evaluator

Unit test

documentation

Does not include demo_api as it is covered in another issue tickets.
opened by raymondng76 0
download_pretrained for demo API does not cache downloaded files/models
To allow the containers to start up quicker, models and files were downloaded and cached during build time.

Recent changes in the huggingface transformers package has broken this functionality:

Released in v4.22.0

Issue

Possible choices moving forward:

Write a simple caching utility function

Stick to versions of transformers before 4.22.0
opened by jonheng 0
Add Stance Detection model

Paper: https://aclanthology.org/2020.emnlp-main.108.pdf

Prof: Jiang Jing from SMU

Repo: GitHub - jefferyYu/DualHierarchicalTransformer: Predicting Stance and Rumor Veracity via Dual Hierarchical Transformer

opened by atenzer 0

Releases(v0.4.0)

v0.4.0(Oct 7, 2022)

New model: Coherence Momentum Model
Source code(tar.gz)
Source code(zip)
v0.3.0(Apr 22, 2022)
New models:

Sentic GCN

LIF

UFD

Source code(tar.gz)
Source code(zip)
v0.2.0(Oct 19, 2021)
New models:

RST Pointer

GEC

Source code(tar.gz)
Source code(zip)
v0.1.1(Aug 26, 2021)

Bug fix on rumour detection module paths
Source code(tar.gz)
Source code(zip)
v0.1.0(Aug 26, 2021)

Removed UFD for further review.

Refactoring and improvements to LSR and Rumour detection models.
Source code(tar.gz)
Source code(zip)
v0.0.1(Aug 5, 2021)
Initial release of sgnlp.

Models included:

RECCON

LSR

UFD

Rumour detection twitter

Source code(tar.gz)
Source code(zip)

Owner

AI Singapore | AI Makerspace

Grow local AI talents and empowering start-ups, SMEs and enterprises with AI components, frameworks, platforms and advisory services.

GitHub Repository

Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

25.6k Jan 06, 2023

Checking spelling of form elements

Checking spelling of form elements. You can check the source files of external workflows/reports and configuration files

15 Sep 12, 2022

RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

RuCLIPtiny Zero-shot image classification model for Russian language RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network

26 Sep 20, 2022

Longformer: The Long-Document Transformer

Longformer Longformer and LongformerEncoderDecoder (LED) are pretrained transformer models for long documents. ***** New December 1st, 2020: Longforme

1.6k Dec 29, 2022

Translate - a PyTorch Language Library

NOTE PyTorch Translate is now deprecated, please use fairseq instead. Translate - a PyTorch Language Library Translate is a library for machine transl

775 Dec 24, 2022

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Emily's Symbol Dictionary Design This dictionary was created with the following goals in mind: Have a consistent method to type (pretty much) every sy

68 Jan 07, 2023

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

8.4k Dec 26, 2022

Turn clang-tidy warnings and fixes to comments in your pull request

clang-tidy pull request comments A GitHub Action to post clang-tidy warnings and suggestions as review comments on your pull request. What platisd/cla

30 Dec 13, 2022

Entity Disambiguation as text extraction (ACL 2022)

ExtEnD: Extractive Entity Disambiguation This repository contains the code of ExtEnD: Extractive Entity Disambiguation, a novel approach to Entity Dis

121 Jan 03, 2023

Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

186 Dec 29, 2022

This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

230 Nov 22, 2022

The training code for the 4th place model at MDX 2021 leaderboard A.

32 Dec 18, 2022

This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

Advent-of-cyber-2019-writeup This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe https://tryhackme.com/shivam007/badges/c

5 Jul 17, 2022

SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE: Simple Contrastive Learning of Sentence Embeddings This repository contains the code and pre-trained models for our paper SimCSE: Simple Contr

2.5k Jan 07, 2023

构建一个多源（公众号、RSS）、干净、个性化的阅读环境

2C 构建一个多源（公众号、RSS）、干净、个性化的阅读环境作为一名微信公众号的重度用户，公众号一直被我设为汲取知识的地方。随着使用程度的增加，相信大家或多或少会有一个比较头疼的问题——广告问题。假设你关注的公众号有十来个，若一个公众号两周接一次广告，理论上你会面临二十多次广告，实际上会更多，运

678 Dec 28, 2022

BERT, LDA, and TFIDF based keyword extraction in Python

BERT, LDA, and TFIDF based keyword extraction in Python kwx is a toolkit for multilingual keyword extraction based on Google's BERT and Latent Dirichl

41 Dec 27, 2022

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

fast.ai ULMFiT with SentencePiece from pretraining to deployment Motivation: Why even bother with a non-BERT / Transformer language model? Short answe

26 May 27, 2022

Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

150 Dec 23, 2022

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

NLP-Models-Tensorflow, Gathers machine learning and tensorflow deep learning models for NLP problems, code simplify inside Jupyter Notebooks 100%. Tab

1.7k Dec 30, 2022

An end to end ASR Transformer model training repo

END TO END ASR TRANSFORMER 本项目基于transformer 6*encoder+6*decoder的基本结构构造的端到端的语音识别系统 Model Instructions 1.数据准备: 自行下载数据，遵循文件结构如下： ├── data │ ├── train │

10 Jul 19, 2022