Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Last update: Dec 29, 2022

Related tags

Text Data & NLP NegSampling-NER

Overview

Negative Sampling for NER

Unlabeled entity problem is prevalent in many NER scenarios (e.g., weakly supervised NER). Our paper in ICLR-2021 proposes using negative sampling for solving this important issue. This repo. contains the implementation of our approach.

Note that this is not an officially supported Tencent product.

Preparation

Two steps. Firstly, reformulate the NER data and move it into a new folder named "dataset". The folder contains {train, dev, test}.json. Each JSON file is a list of dicts. See the following case:

[ 
 {
  "sentence": "['Somerset', '83', 'and', '174', '(', 'P.', 'Simmons', '4-38', ')', ',', 'Leicestershire', '296', '.']",
  "labeled entities": "[(0, 0, 'ORG'), (5, 6, 'PER'), (10, 10, 'ORG')]",
 },
 {
  "sentence": "['Leicestershire', '22', 'points', ',', 'Somerset', '4', '.']",
  "labeled entities": "[(0, 0, 'ORG'), (4, 4, 'ORG')]",
 }
]

Secondly, pretrained LM (i.e., BERT) and eval. script. Create a dir. named "resource" and arrange them as

resource
- bert-base-cased
  - model.pt
  - vocab.txt
- conlleval.pl

Note that the files in BERT.tar.gz need to be renamed as above.

Training and Test

CUDA_VISIBLE_DEVICES=0 python main.py -dd dataset -cd save -rd resource

Citation

@inproceedings{li2021empirical,
    title={Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition},
    author={Yangming Li and lemao liu and Shuming Shi},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=5jRVa89sZk}
}

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Related tags

Overview

Negative Sampling for NER

Preparation

Training and Test

Citation

Owner

Yangming Li

Klexikon: A German Dataset for Joint Summarization and Simplification

Python utility library for compositing PDF documents with reportlab.

Grover is a model for Neural Fake News -- both generation and detectio

SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

Library for Russian imprecise rhymes generation

Precision Medicine Knowledge Graph (PrimeKG)

Train BPE with fastBPE, and load to Huggingface Tokenizer.

A Flask Sentiment Analysis API, with visual implementation

A multi-voice TTS system trained with an emphasis on quality

This is a GUI program that will generate a word search puzzle image

CYGNUS, the Cynical AI, combines snarky responses with uncanny aggression.

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

A fast, efficient universal vector embedding utility package.

Shirt Bot is a discord bot which uses GPT-3 to generate text

Fake Shakespearean Text Generator

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

NVDA, the free and open source Screen Reader for Microsoft Windows