ARU-Net - Deep Learning Chinese Word Segment

Overview

ARU-Net: A Neural Pixel Labeler for Layout Analysis of Historical Documents

Contents

Introduction

This is the Tensorflow code corresponding to A Two-Stage Method for Text Line Detection in Historical Documents . This repo contains the neural pixel labeling part described in the paper. It contains the so-called ARU-Net (among others) which is basically an extended version of the well known U-Net [2]. Besides the model and the basic workflow to train and test models, different data augmentation strategies are implemented to reduce the amound of training data needed. The repo's features are summarized below:

  • Inference Demo
    • Trained and freezed tensorflow graph included
    • Easy to reuse for own inference tests
  • Workflow
    • Full training workflow to parametrize and train your own models
    • Contains different models, data augmentation strategies, loss functions
    • Training on specific GPU, this enables the training of several models on a multi GPU system in parallel
    • Easy validation for trained model either using classical or ema-shadow weights

Please cite [1] if you find this repo useful and/or use this software for own work.

Installation

  1. Use python 2.7
  2. Any version of tensorflow version > 1.0 should be ok.
  3. Python packages: matplotlib (>=1.3.1), pillow (>=2.1.0), scipy (>=1.0.0), scikit-image (>=0.13.1), click (>=5.x)
  4. Clone the Repo
  5. Done

Demo

To run the demo follow:

  1. Open a shell
  2. Make sure Tensorflow is available, e.g., go to docker environment, activate conda, ...
  3. Navigate to the repo folder YOUR_PATH/ARU-Net/
  4. Run:
python run_demo_inference.py 

The demo will load a trained model and perform inference for five sample images of the cBad test set [3], [4]. The network was trained to predict the position of baselines and separators for the begining and end of each text line. After running the python script you should see a matplot window. To go to the next image just close it.

Example

The example images are sampled from the cBad test set [3], [4]. One image along with its results are shown below.

image_1 image_2 image_3

Training

This section describes step-by-step the procedure to train your own model.

Train data:

The following describes how the training data should look like:

  • The images along with its pixel ground truth have to be in the same folder
  • For each image: X.jpg, there have to be images named X_GT0.jpg, X_GT1.jpg, X_GT2.jpg, ... (for each channel to be predicted one GT image)
  • Each ground truth image is binary and contains ones at positions where the corresponding class is present and zeros otherwise (see demo_images/demo_traindata for a sample)
  • Generate a list containing row-wise the absolute pathes to the images (just the document images not the GT ones)

Val data:

The following describes how the validation data should look like:

Train the model:

The following describes how to train a model:

  • Have a look at the pix_lab/main/train_aru.py script
  • Parametrize it like you wish (have a look at the data_provider, cost and optimizer scripts to see all parameters)
  • Setting the correct paths, adapting the number of output classes and using the default parametrization should work fine for a first training
  • Run:
python -u pix_lab/main/train_aru.py &> info.log 

Validate the model:

The following describes how to validate a trained model:

  • Train and val losses are printed in info.log
  • To validate the checkpoints using the classical weights as well as its ema-shadows, adapt and run:
pix_lab/main/validate_ckpt.py

Comments

If you are interested in a related problem, this repo could maybe help you as well. The ARU-Net can be used for each pixel labeling task, besides the baseline detection task, it can be easily used for, e.g., binarization, page segmentation, ... purposes.

References

Please cite [1] if using this code.

A Two-Stage Method for Text Line Detection in Historical Documents

[1] T. Grüning, G. Leifert, T. Strauß, R. Labahn, A Two-Stage Method for Text Line Detection in Historical Documents

@article{Gruning2018,
arxivId = {1802.03345},
author = {Gr{\"{u}}ning, Tobias and Leifert, Gundram and Strau{\ss}, Tobias and Labahn, Roger},
title = {{A Two-Stage Method for Text Line Detection in Historical Documents}},
url = {http://arxiv.org/abs/1802.03345},
year = {2018}
}

U-Net: Convolutional Networks for Biomedical Image Segmentation

[2] O. Ronneberger, P, Fischer, T, Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation

@article{Ronneberger2015,
arxivId = {1505.04597},
author = {Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas},
journal = {Miccai},
pages = {234--241},
title = {{U-Net: Convolutional Networks for Biomedical Image Segmentation}},
year = {2015}
}

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

[3] T. Grüning, R. Labahn, M. Diem, F. Kleber, S. Fiel, READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

@article{Gruning2017,
arxivId = {1705.03311},
author = {Gr{\"{u}}ning, Tobias and Labahn, Roger and Diem, Markus and Kleber, Florian and Fiel, Stefan},
title = {{READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents}},
url = {http://arxiv.org/abs/1705.03311},
year = {2017}
}

A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents

[4] M. Diem, F. Kleber, S. Fiel, T. Grüning, B. Gatos, ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

@misc{Diem2017,
author = {Diem, Markus and Kleber, Florian and Fiel, Stefan and Gr{\"{u}}ning, Tobias and Gatos, Basilis},
doi = {10.5281/zenodo.257972},
title = {ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)},
year = {2017}
}
The papers published in top-tier AI conferences in recent years.

AI-conference-papers The papers published in top-tier AI conferences in recent years. Paper table AAAI ICLR CVPR ICML ICCV ECCV NIPS 2019 ✔️ ✔️ ✔️ ✔️

Jinbae Park 6 Dec 09, 2022
A curated list of resources dedicated to scene text localization and recognition

Scene Text Localization & Recognition Resources A curated list of resources dedicated to scene text localization and recognition. Any suggestions and

CarlosTao 1.6k Dec 22, 2022
A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

Home-Security-Demo A facial recognition program that plays a alarm (mp3 file) when a person is seen in the room. A basic theif using Python and OpenCV

SysKey 4 Nov 02, 2021
Fatigue Driving Detection Based on Dlib

Fatigue Driving Detection Based on Dlib

5 Dec 14, 2022
QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 119 Dec 02, 2022
scantailor - Scan Tailor is an interactive post-processing tool for scanned pages.

Scan Tailor - scantailor.org This project is no longer maintained, and has not been maintained for a while. About Scan Tailor is an interactive post-p

1.5k Dec 28, 2022
Erosion and dialation using structure element in OpenCV python

Erosion and dialation using structure element in OpenCV python

Tamzid hasan 2 Nov 11, 2021
轻量级公式 OCR 小工具:一键识别各类公式图片,并转换为 LaTeX 格式

QC-Formula | 青尘公式 OCR 介绍 轻量级开源公式 OCR 小工具:一键识别公式图片,并转换为 LaTeX 格式。 支持从 电脑本地 导入公式图片;(后续版本将支持直接从网页导入图片) 公式图片支持 .png / .jpg / .bmp,大小为 4M 以内均可; 支持印刷体及手写体,前

青尘工作室 26 Jan 07, 2023
a Deep Learning Framework for Text

DeLFT DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow framework for text processing, focusing on sequence labelling (e.g. named ent

Patrice Lopez 350 Dec 19, 2022
Markup for note taking

Subtext: markup for note-taking Subtext is a text-based, block-oriented hypertext format. It is designed with note-taking in mind. It has a simple, pe

Gordon Brander 224 Jan 01, 2023
CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介 基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别 文本检测:CTPN 文本识别:DenseNet + CTC 环境部署 sh setup.sh 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释 Demo 将测试图片放入test_images

Yang Chenguang 2.6k Dec 29, 2022
An organized collection of tutorials and projects created for aspriring computer vision students.

A repository created with the purpose of teaching students in BME lab 308A- Hanoi University of Science and Technology

Givralnguyen 5 Nov 24, 2021
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021
Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Sergio Díaz Fernández 1 Jan 13, 2022
Lightning Fast Language Prediction 🚀

whatthelang Lightning Fast Language Prediction 🚀 Dependencies The dependencies can be installed using the requirements.txt file: $ pip install -r req

Indix 152 Oct 16, 2022
Table Extraction Tool

Tree Structure - Table Extraction Fonduer has been successfully extended to perform information extraction from richly formatted data such as tables.

HazyResearch 88 Jun 02, 2022
This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe libraries.

CVZone This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe librar

CVZone 648 Dec 30, 2022
Ddddocr - 通用验证码识别OCR pypi版

带带弟弟OCR通用验证码识别SDK免费开源版 今天ddddocr又更新啦! 当前版本为1.3.1 想必很多做验证码的新手,一定头疼碰到点选类型的图像,做样本费时

Sml2h3 4.4k Dec 31, 2022
kaldi-asr/kaldi is the official location of the Kaldi project.

Kaldi Speech Recognition Toolkit To build the toolkit: see ./INSTALL. These instructions are valid for UNIX systems including various flavors of Linux

Kaldi 12.3k Jan 05, 2023
Generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv

basic-dataset-generator-from-image-of-numbers generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv inpu

1 Jan 01, 2022