Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Overview

Chinese NER using Bert

BERT for Chinese NER.

dataset list

  1. cner: datasets/cner
  2. CLUENER: https://github.com/CLUEbenchmark/CLUENER

model list

  1. BERT+Softmax
  2. BERT+CRF
  3. BERT+Span

requirement

  1. 1.1.0 =< PyTorch < 1.5.0
  2. cuda=9.0
  3. python3.6+

input format

Input format (prefer BIOS tag scheme), with each character its label for one line. Sentences are splited with a null line.

美	B-LOC
国	I-LOC
的	O
华	B-PER
莱	I-PER
士	I-PER

我	O
跟	O
他	O

run the code

  1. Modify the configuration information in run_ner_xxx.py or run_ner_xxx.sh .
  2. sh scripts/run_ner_xxx.sh

note: file structure of the model

├── prev_trained_model
|  └── bert_base
|  |  └── pytorch_model.bin
|  |  └── config.json
|  |  └── vocab.txt
|  |  └── ......

CLUENER result

The overall performance of BERT on dev:

Accuracy (entity) Recall (entity) F1 score (entity)
BERT+Softmax 0.7897 0.8031 0.7963
BERT+CRF 0.7977 0.8177 0.8076
BERT+Span 0.8132 0.8092 0.8112
BERT+Span+adv 0.8267 0.8073 0.8169
BERT-small(6 layers)+Span+kd 0.8241 0.7839 0.8051
BERT+Span+focal_loss 0.8121 0.8008 0.8064
BERT+Span+label_smoothing 0.8235 0.7946 0.8088

ALBERT for CLUENER

The overall performance of ALBERT on dev:

model version Accuracy(entity) Recall(entity) F1(entity) Train time/epoch
albert base_google 0.8014 0.6908 0.7420 0.75x
albert large_google 0.8024 0.7520 0.7763 2.1x
albert xlarge_google 0.8286 0.7773 0.8021 6.7x
bert google 0.8118 0.8031 0.8074 -----
albert base_bright 0.8068 0.7529 0.7789 0.75x
albert large_bright 0.8152 0.7480 0.7802 2.2x
albert xlarge_bright 0.8222 0.7692 0.7948 7.3x

Cner result

The overall performance of BERT on dev(test):

Accuracy (entity) Recall (entity) F1 score (entity)
BERT+Softmax 0.9586(0.9566) 0.9644(0.9613) 0.9615(0.9590)
BERT+CRF 0.9562(0.9539) 0.9671(0.9644) 0.9616(0.9591)
BERT+Span 0.9604(0.9620) 0.9617(0.9632) 0.9611(0.9626)
BERT+Span+focal_loss 0.9516(0.9569) 0.9644(0.9681) 0.9580(0.9625)
BERT+Span+label_smoothing 0.9566(0.9568) 0.9624(0.9656) 0.9595(0.9612)
Owner
Weitang Liu
weibo: https://weibo.com/277974397
Weitang Liu
📔️ Generate a text-based journal from a template file.

JGen 📔️ Generate a text-based journal from a template file. Contents Getting Started Example Overview Usage Details Reserved Keywords Gotchas Getting

Harrison Broadbent 21 Sep 25, 2022
Japanese synonym library

chikkarpy chikkarpyはchikkarのPython版です。 chikkarpy is a Python version of chikkar. chikkarpy は Sudachi 同義語辞書を利用し、SudachiPyの出力に同義語展開を追加するために開発されたライブラリです。

Works Applications 48 Dec 14, 2022
Production First and Production Ready End-to-End Keyword Spotting Toolkit

Production First and Production Ready End-to-End Keyword Spotting Toolkit

223 Jan 02, 2023
An automated program that helps customers of Pizza Palour place their pizza orders

PIzza_Order_Assistant Introduction An automated program that helps customers of Pizza Palour place their pizza orders. The program uses voice commands

Tindi Sommers 1 Dec 26, 2021
Machine learning classifiers to predict American Sign Language .

ASL-Classifiers American Sign Language (ASL) is a natural language that serves as the predominant sign language of Deaf communities in the United Stat

Tarek idrees 0 Feb 08, 2022
Client library to download and publish models and other files on the huggingface.co hub

huggingface_hub Client library to download and publish models and other files on the huggingface.co hub Do you have an open source ML library? We're l

Hugging Face 644 Jan 01, 2023
leaking paid token generator that was a shit lmao for 100$ haha

Discord-Token-Generator-Leaked leaking paid token generator that was a shit lmao for 100$ he selling it for 100$ wth here the code enjoy don't forget

Keevo 5 Apr 15, 2022
Shellcode antivirus evasion framework

Schrodinger's Cat Schrodinger'sCat is a Shellcode antivirus evasion framework Technical principle Please visit my blog https://idiotc4t.com/ How to us

idiotc4t 27 Jul 09, 2022
Autoregressive Entity Retrieval

The GENRE (Generative ENtity REtrieval) system as presented in Autoregressive Entity Retrieval implemented in pytorch. @inproceedings{decao2020autoreg

Meta Research 611 Dec 16, 2022
NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

Code has been run on Google Colab, thanks Google for providing computational resources Contents Natural Language Processing(自然语言处理) Text Classificati

1.5k Nov 14, 2022
A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

Ian 1 Jan 15, 2022
NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

This file contains the following documents sumbited for Baruch CIS9665 group 9 fall 2021. 1. Dataset: drug_reviews.csv 2. python codes for text classi

Aarif Munwar Jahan 2 Jan 04, 2023
Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

Lau 1 Dec 17, 2021
Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings Trong bài viết này mình sẽ sử dụng pretrain model SimCS

Vo Van Phuc 18 Nov 25, 2022
Natural Language Processing with transformers

we want to create a repo to illustrate usage of transformers in chinese

Datawhale 763 Dec 27, 2022
Findings of ACL 2021

Assessing Dialogue Systems with Distribution Distances [arXiv][code] We propose to measure the performance of a dialogue system by computing the distr

Yahui Liu 16 Feb 24, 2022
Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

TextCortex - HemingwAI Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingw

TextCortex AI 27 Nov 28, 2022
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

Steven Loria 8.4k Dec 26, 2022
Header-only C++ HNSW implementation with python bindings

Hnswlib - fast approximate nearest neighbor search Header-only C++ HNSW implementation with python bindings. NEWS: version 0.6 Thanks to (@dyashuni) h

2.3k Jan 05, 2023
Stuff related to Ben Eater's 8bit breadboard computer

8bit breadboard computer simulator This is an assembler + simulator/emulator of Ben Eater's 8bit breadboard computer. For a version with its RAM upgra

Marijn van Vliet 29 Dec 29, 2022