A text augmentation tool for named entity recognition.

Last update: Oct 11, 2022

Overview

neraug

This python library helps you with augmenting text data for named entity recognition.

Augmentation Example

Reference from An Analysis of Simple Data Augmentation for Named Entity Recognition

Installation

To install the library:

pip install neraug

Usage

One of the example algorithms: DictionaryReplacement:

>>> from neraug.augmentator import DictionaryReplacement
>>> from neraug.scheme import IOBES

>>> ne_dic = {'Tokyo Big Sight': 'LOC'}
>>> augmentator = DictionaryReplacement(ne_dic, str.split, IOBES)
>>> x = ['I', 'went', 'to', 'Tokyo']
>>> y = ['O', 'O', 'O', 'S-LOC']
>>> x_augs, y_augs = augmentator.augment(x, y, n=1)   
>>> x_augs
[['I', 'went', 'to', 'Tokyo', 'Big', 'Sight']]
>>> y_augs
[['O', 'O', 'O', 'B-LOC', 'I-LOC', 'E-LOC']]

The library supports the following algorithms:

DictionaryReplacement
LabelWiseTokenReplacement
MentionReplacement
ShuffleWithinSegment

and supports the following scheme:

IOB2
IOBES
BILOU

Reference

Appreciate for the following research:

An Analysis of Simple Data Augmentation for Named Entity Recognition

Citation

@misc{neraug,
  title={neraug: A data augmentation tool for named entity recognition},
  author={Hiroki Nakayama},
  url={https://github.com/Hironsan/neraug},
  year={2021}
}

You might also like...

Pytorch-Named-Entity-Recognition-with-BERT

BERT NER Use google BERT to do CoNLL-2003 NER ! Train model using Python and Inference using C++ ALBERT-TF2.0 BERT-NER-TENSORFLOW-2.0 BERT-SQuAD Requi

1.1k Dec 25, 2022

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

0 Feb 13, 2022

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

For better performance, you can try NLPGNN, see NLPGNN for more details. BERT-NER Version 2 Use Google's BERT for named entity recognition （CoNLL-2003

1.2k Dec 26, 2022

Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

14 Nov 15, 2022

Nested Named Entity Recognition

Nested Named Entity Recognition Training Dataset: CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark url: https://tianchi.aliyun.

8 Dec 25, 2022

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

9 Nov 7, 2022

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

3 Feb 27, 2022

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

README Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model a

45 Nov 29, 2022

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

80 Jan 3, 2023

Releases(v0.1.1)

v0.1.1(Jul 22, 2021)

Remove tokenizer from MentionReplacement
Source code(tar.gz)
Source code(zip)
v0.1.0(Jul 22, 2021)

Source code(tar.gz)
Source code(zip)

A text augmentation tool for named entity recognition.

Related tags

Overview

neraug

Augmentation Example

Installation

Usage

Reference

Citation

You might also like...

Pytorch-Named-Entity-Recognition-with-BERT

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

Named Entity Recognition API used by TEI Publisher

Nested Named Entity Recognition

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

Releases(v0.1.1)

v0.1.1(Jul 22, 2021)

v0.1.0(Jul 22, 2021)

Owner

Hiroki Nakayama

profile tools for pytorch nn models

基于pytorch+bert的中文事件抽取

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Utilizing RBERT model for KLUE Relation Extraction task

Count the frequency of letters or words in a text file and show a graph.

The official repository of the ISBI 2022 KNIGHT Challenge

基于百度的语音识别，用python实现，pyaudio+pyqt

Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Python powered crossword generator with database with 20k+ polish words

Code Implementation of "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

A programming language with logic of Python, and syntax of all languages.

Kurumi ChatBot

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

Anuvada: Interpretable Models for NLP using PyTorch

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

Automated question generation and question answering from Turkish texts using text-to-text transformers