This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer"

Last update: Nov 28, 2022

Related tags

Deep Learning FlatTN

Overview

FlatTN

This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer" published on ICASSP 2022.

Requirement

Python: 3.7.3
PyTorch: 1.2.0
FastNLP: 0.5.0
Numpy: 1.16.4
fitlog

For more about FastNLP, please visit here. For Fitlog, please refer to this.

Dataset download

We release a large-scale Chinese Text Normalization (TN) Dataset in corporatioin with Databaker (Beijing) Technology Co., Ltd.

To download the dataset, please visit https://www.data-baker.com/en/#/data/index/TNtts.

(For Chinese version of the download page, please visit https://www.data-baker.com/data/index/TNtts.)

Data preprocessing

The raw dataset in jsonl format are saved at: dataset/processed/CN_TN_epoch-01-28645_2.jsonl

We preprocessed the data into the BMES format, and divided the data into train 、dev 、test by 8:1:1.

dataset/processed/shuffled_BMES
                      ├── train.char.bmes
                      ├── dev.char.bmes
                      └── test.char.bmes

An example of the processed data in BMES format is as follows:

2 B-DIGIT
0 M-DIGIT
1 M-DIGIT
5 E-DIGIT
年 S-SELF
， S-PUNC
只 S-SELF
剩 S-SELF
3 B-CARDINAL
9 E-CARDINAL
天 S-SELF
。 S-PUNC

You can re-run our code to preprocess and divide the raw dataset again:

cd dataset/processed
python preprocess.py

You can also used the following code to get statistics of all NSW categories of the data:

cd dataset/processed
python stat.py

Training

Our code are in version V1, run training code

cd V1
python flat_main.py --dataset databaker

Our proposed rule base are saved in a python file: V1/add_rule.py

Acknowledgement

Our code is based on Flat-Lattice-Transformer (FLAT) from LeeSureman.

For more information about FLAT, please refer to LeeSureman/Flat-Lattice-Transformer.

This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer"

Related tags

Overview

FlatTN

Requirement

Dataset download

Data preprocessing

Training

Acknowledgement

Owner

THUHCSI

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

Code for "Layered Neural Rendering for Retiming People in Video."

Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

PyTorch implementation of MuseMorphose, a Transformer-based model for music style transfer.

Bayesian regularization for functional graphical models.

TensorFlow implementation of "Attention is all you need (Transformer)"

Brain tumor detection using Convolution-Neural Network (CNN)

Optimus: the first large-scale pre-trained VAE language model

The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

ThunderSVM: A Fast SVM Library on GPUs and CPUs

Incorporating Transformer and LSTM to Kalman Filter with EM algorithm

Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods

FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE)

Code accompanying paper: Meta-Learning to Improve Pre-Training

Metric learning algorithms in Python

PyTorch implementation of DCT fast weight RNNs

Masked regression code - Masked Regression

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

This repo contains the implementation of YOLOv2 in Keras with Tensorflow backend.