This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer"

Related tags

Deep LearningFlatTN
Overview

FlatTN

This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer" published on ICASSP 2022.

Requirement

Python: 3.7.3
PyTorch: 1.2.0
FastNLP: 0.5.0
Numpy: 1.16.4
fitlog

For more about FastNLP, please visit here. For Fitlog, please refer to this.

Dataset download

We release a large-scale Chinese Text Normalization (TN) Dataset in corporatioin with Databaker (Beijing) Technology Co., Ltd.

To download the dataset, please visit https://www.data-baker.com/en/#/data/index/TNtts.

(For Chinese version of the download page, please visit https://www.data-baker.com/data/index/TNtts.)

Data preprocessing

The raw dataset in jsonl format are saved at: dataset/processed/CN_TN_epoch-01-28645_2.jsonl

We preprocessed the data into the BMES format, and divided the data into traindevtest by 8:1:1.

dataset/processed/shuffled_BMES
                      ├── train.char.bmes
                      ├── dev.char.bmes
                      └── test.char.bmes

An example of the processed data in BMES format is as follows:

2 B-DIGIT
0 M-DIGIT
1 M-DIGIT
5 E-DIGIT
年 S-SELF
, S-PUNC
只 S-SELF
剩 S-SELF
3 B-CARDINAL
9 E-CARDINAL
天 S-SELF
。 S-PUNC

You can re-run our code to preprocess and divide the raw dataset again:

cd dataset/processed
python preprocess.py

You can also used the following code to get statistics of all NSW categories of the data:

cd dataset/processed
python stat.py

Training

Our code are in version V1, run training code

cd V1
python flat_main.py --dataset databaker

Our proposed rule base are saved in a python file: V1/add_rule.py

Acknowledgement

Our code is based on Flat-Lattice-Transformer (FLAT) from LeeSureman.

For more information about FLAT, please refer to LeeSureman/Flat-Lattice-Transformer.

Owner
THUHCSI
Human-Computer Speech Interaction Lab at Tsinghua University
THUHCSI
PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

Federated Learning with Non-IID Data This is an implementation of the following paper: Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, Vik

Youngjoon Lee 48 Dec 29, 2022
Code for "Layered Neural Rendering for Retiming People in Video."

Layered Neural Rendering in PyTorch This repository contains training code for the examples in the SIGGRAPH Asia 2020 paper "Layered Neural Rendering

Google 154 Dec 16, 2022
Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

PV-RAFT This repository contains the PyTorch implementation for paper "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clou

Yi Wei 43 Dec 05, 2022
PyTorch implementation of MuseMorphose, a Transformer-based model for music style transfer.

MuseMorphose This repository contains the official implementation of the following paper: Shih-Lun Wu, Yi-Hsuan Yang MuseMorphose: Full-Song and Fine-

Yating Music, Taiwan AI Labs 142 Jan 08, 2023
Bayesian regularization for functional graphical models.

BayesFGM Paper: Jiajing Niu, Andrew Brown. Bayesian regularization for functional graphical models. Requirements R version 3.6.3 and up Python 3.6 and

0 Oct 07, 2021
TensorFlow implementation of "Attention is all you need (Transformer)"

[TensorFlow 2] Attention is all you need (Transformer) TensorFlow implementation of "Attention is all you need (Transformer)" Dataset The MNIST datase

YeongHyeon Park 4 Jan 05, 2022
Brain tumor detection using Convolution-Neural Network (CNN)

Detect and Classify Brain Tumor using CNN. A system performing detection and classification by using Deep Learning Algorithms using Convolution-Neural Network (CNN).

assia 1 Feb 07, 2022
Optimus: the first large-scale pre-trained VAE language model

Optimus: the first pre-trained Big VAE language model This repository contains source code necessary to reproduce the results presented in the EMNLP 2

314 Dec 19, 2022
The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

GSDN-F and GSDN-EF This repository provides a reference implementation of GSDN-F and GSDN-EF as described in the paper "Understanding Graph Neural Net

Guoji Fu 18 Nov 14, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Xtra Computing Group 1.4k Dec 22, 2022
Incorporating Transformer and LSTM to Kalman Filter with EM algorithm

Deep learning based state estimation: incorporating Transformer and LSTM to Kalman Filter with EM algorithm Overview Kalman Filter requires the true p

zshicode 57 Dec 27, 2022
Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods

Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods Introduction Graph Neural Networks (GNNs) have demonstrated

37 Dec 15, 2022
FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE)

FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction

18 Sep 01, 2022
Code accompanying paper: Meta-Learning to Improve Pre-Training

Meta-Learning to Improve Pre-Training This folder contains code to run experiments in the paper Meta-Learning to Improve Pre-Training, NeurIPS 2021. P

28 Dec 31, 2022
Metric learning algorithms in Python

metric-learn: Metric Learning in Python metric-learn contains efficient Python implementations of several popular supervised and weakly-supervised met

1.3k Jan 02, 2023
PyTorch implementation of DCT fast weight RNNs

DCT based fast weights This repository contains the official code for the paper: Training and Generating Neural Networks in Compressed Weight Space. T

Kazuki Irie 4 Dec 24, 2022
Masked regression code - Masked Regression

Masked Regression MR - Python Implementation This repositery provides a python implementation of MR (Masked Regression). MR can efficiently synthesize

Arbish Akram 1 Dec 23, 2021
CondenseNet V2: Sparse Feature Reactivation for Deep Networks

CondenseNetV2 This repository is the official Pytorch implementation for "CondenseNet V2: Sparse Feature Reactivation for Deep Networks" paper by Le Y

Haojun Jiang 74 Dec 12, 2022
Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

Self-supervised Graph-level Representation Learning with Local and Global Structure Introduction This project is an implementation of ``Self-supervise

MilaGraph 50 Dec 09, 2022
This repo contains the implementation of YOLOv2 in Keras with Tensorflow backend.

Easy training on custom dataset. Various backends (MobileNet and SqueezeNet) supported. A YOLO demo to detect raccoon run entirely in brower is accessible at https://git.io/vF7vI (not on Windows).

Huynh Ngoc Anh 1.7k Dec 24, 2022