Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

Overview

CRNN_Tensorflow

This is a TensorFlow implementation of a Deep Neural Network for scene text recognition. It is mainly based on the paper "An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition". You can refer to the paper for architecture details. Thanks to the author Baoguang Shi.

The model consists of a CNN stage extracting features which are fed to an RNN stage (Bi-LSTM) and a CTC loss.

Installation

This software has been developed on Ubuntu 16.04(x64) using python 3.5 and TensorFlow 1.12. Since it uses some recent features of TensorFlow it is incompatible with older versions.

The following methods are provided to install dependencies:

Conda

You can create a conda environment with the required dependencies using:

conda env create -f crnntf-env.yml

Pip

Required packages may be installed with

pip3 install -r requirements.txt

Testing the pre-trained model

Evaluate the model on the synth90k dataset

In this repo you will find a model pre-trained on the Synth 90kdataset. When the tfrecords file of synth90k dataset has been successfully generated you may evaluated the model by the following script

The pretrained crnn model weights on Synth90k dataset can be found here

python tools/evaluate_shadownet.py --dataset_dir PATH/TO/YOUR/DATASET_DIR 
--weights_path PATH/TO/YOUR/MODEL_WEIGHTS_PATH
--char_dict_path PATH/TO/CHAR_DICT_PATH 
--ord_map_dict_path PATH/TO/ORD_MAP_PATH
--process_all 1 --visualize 1

If you set visualize true the expected output during evaluation process is

evaluate output

After all the evaluation process is done you should see some thing like this:

evaluation_result

The model's main evaluation index are as follows:

Test Dataset Size: 891927 synth90k test images

Per char Precision: 0.974325 without average weighted on each class

Full sequence Precision: 0.932981 without average weighted on each class

For Per char Precision:

single_label_accuracy = correct_predicted_char_nums_of_single_sample / single_label_char_nums

avg_label_accuracy = sum(single_label_accuracy) / label_nums

For Full sequence Precision:

single_label_accuracy = 1 if the prediction result is exactly the same as label else 0

avg_label_accuracy = sum(single_label_accuracy) / label_nums

Part of the confusion matrix of every single char looks like this:

evaluation_confusion_matrix

Test the model on the single image

If you want to test a single image you can do it with

python tools/test_shadownet.py --image_path PATH/TO/IMAGE 
--weights_path PATH/TO/MODEL_WEIGHTS
--char_dict_path PATH/TO/CHAR_DICT_PATH 
--ord_map_dict_path PATH/TO/ORD_MAP_PATH

Test example images

Example test_01.jpg

Example image1

Example test_02.jpg

Example image2

Example test_03.jpg

Example image3

Training your own model

Data preparation

Download the whole synth90k dataset here And extract all th files into a root dir which should contain several txt file and several folders filled up with pictures. Then you need to convert the whole dataset into tensorflow records as follows

python tools/write_tfrecords 
--dataset_dir PATH/TO/SYNTH90K_DATASET_ROOT_DIR
--save_dir PATH/TO/TFRECORDS_DIR

During converting all the source image will be scaled into (32, 100)

Training

For all the available training parameters, check global_configuration/config.py, then train your model with

python tools/train_shadownet.py --dataset_dir PATH/TO/YOUR/TFRECORDS
--char_dict_path PATH/TO/CHAR_DICT_PATH 
--ord_map_dict_path PATH/TO/ORD_MAP_PATH

If you wish, you can add more metrics to the training progress messages with --decode_outputs 1, but this will slow training down. You can also continue the training process from a snapshot with

python tools/train_shadownet.py --dataset_dir PATH/TO/YOUR/TFRECORDS
--weights_path PATH/TO/YOUR/PRETRAINED_MODEL_WEIGHTS
--char_dict_path PATH/TO/CHAR_DICT_PATH --ord_map_dict_path PATH/TO/ORD_MAP_PATH

If you has multiple gpus in your local machine you may use multiple gpu training to access a larger batch size input data. This will be supported as follows

python tools/train_shadownet.py --dataset_dir PATH/TO/YOUR/TFRECORDS
--char_dict_path PATH/TO/CHAR_DICT_PATH --ord_map_dict_path PATH/TO/ORD_MAP_PATH
--multi_gpus 1

The sequence distance is computed by calculating the distance between two sparse tensors so the lower the accuracy value is the better the model performs. The training accuracy is computed by calculating the character-wise precision between the prediction and the ground truth so the higher the better the model performs.

Tensorflow Serving

Thanks for Eldon's contribution of tensorflow service function:)

Since tensorflow model server is a very powerful tools to serve the DL model in industry environment. Here's a script for you to convert the checkpoints model file into tensorflow saved model which can be used with tensorflow model server to serve the CRNN model. If you can not run the script normally you may need to check if the checkpoint file path is correct in the bash script.

bash tfserve/export_crnn_saved_model.sh

To start the tensorflow model server you may check following script

bash tfserve/run_tfserve_crnn_gpu.sh

There are two different ways to test the python client of crnn model. First you may test the server via http/rest request by running

python tfserve/crnn_python_client_via_request.py ./data/test_images/test_01.jpg

Second you may test the server via grpc by running

python tfserve/crnn_python_client_via_grpc.py

Experiment

The original experiment run for 2000000 epochs, with a batch size of 32, an initial learning rate of 0.01 and exponential decay of 0.1 every 500000 epochs. During training the train loss dropped as follows

Training loss

The val loss dropped as follows

Validation_loss

2019.3.27 Updates

I have uploaded a newly trained crnn model on chinese dataset which can be found here. Sorry for not knowing the owner of the dataset. But thanks for his great work. If someone knows it you're welcome to let me know. The pretrained weights can be found here

Before start training you may need reorgnize the dataset's label information according to the synth90k dataset's format if you want to use the same data feed pip line mentioned above. Now I have reimplemnted a more efficient tfrecords writer which will accelerate the process of generating tfrecords file. You may refer to the code for details. Some information about training is listed bellow:

image size: (280, 32)

classes nums: 5824 without blank

sequence length: 70

training sample counts: 2733004

validation sample counts: 364401

testing sample counts: 546601

batch size: 32

training iter nums: 200000

init lr: 0.01

Test example images

Example test_01.jpg

Example image1

Example test_02.jpg

Example image2

Example test_03.jpg

Example image3

training tboard file

Training loss

The val loss dropped as follows

Validation_loss

2019.4.10 Updates

Add a small demo to recognize chinese pdf using the chinese crnn model weights. If you want to have a try you may follow the command:

cd CRNN_ROOT_REPO
python tools/recongnize_chinese_pdf.py -c ./data/char_dict/char_dict_cn.json 
-o ./data/char_dict/ord_map_cn.json --weights_path model/crnn_chinese/shadownet.ckpt 
--image_path data/test_images/test_pdf.png --save_path pdf_recognize_result.txt

You should see the same result as follows:

The left image is the recognize result displayed on console and the right image is the origin pdf image.

recognize_result_console

The left image is the recognize result written in local file and the right image is the origin pdf image. recognize_result_file

TODO

  • Add new model weights trained on the whole synth90k dataset
  • Add multiple gpu training scripts
  • Add new pretrained model on chinese dataset
  • Add an online toy demo
  • Add tensorflow service script

Acknowledgement

Please cite my repo CRNN_Tensorflow if you use it.

Contact

Scan the following QR to disscuss :) qr

Owner
MaybeShewill-CV
Engineer from Baidu
MaybeShewill-CV
【Auto】原神⭐钓鱼辅助工具 | 自动收竿、校准游标 | ✨您只需要抛出鱼竿,我们会帮你完成一切✨

原神钓鱼辅助工具 ✨ 作者正在努力重构代码中……会尽快带给大家一个更完美的脚本 ✨ 「您只需抛出鱼竿,然后我们会帮您搞定一切」 如果你觉得这个脚本好用,请点一个 Star ⭐ ,你的 Star 就是作者更新最大的动力 点击这里 查看演示视频 ✨ 欢迎大家在 Issues 中分享自己的配置文件 ✨ ✨

261 Jan 02, 2023
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 03, 2023
Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

Siva Prakash 11 Jan 02, 2022
EQFace: An implementation of EQFace: A Simple Explicit Quality Network for Face Recognition

EQFace: A Simple Explicit Quality Network for Face Recognition The first face recognition network that generates explicit face quality online.

DeepCam Shenzhen 141 Dec 31, 2022
基于openpose和图像分类的手语识别项目

手语识别 0、使用到的模型 (1). openpose,作者:CMU-Perceptual-Computing-Lab https://github.com/CMU-Perceptual-Computing-Lab/openpose (2). 图像分类classification,作者:Bubbl

20 Dec 15, 2022
Controlling the computer volume with your hands // OpenCV

HandsControll-AI Controlling the computer volume with your hands // OpenCV Step 1 git clone https://github.com/Hayk-21/HandsControll-AI.git pip instal

Hayk 1 Nov 04, 2021
Script para controlar o movimento do mouse usando Python e openCV com câmera em tempo real que detecta pontos de referência da mão, rastreia padrões de gestos em vez de um mouse físico.

mouserController Script para controlar o movimento do mouse usando Python e openCV com câmera em tempo real que detecta pontos de referência da mão, r

Vinícius Azevedo 6 Jun 28, 2022
Convert Text-to Handwriting Using Python

Convert Text-to Handwriting Using Python Description In this project we'll use python library that's "pywhatkit" for converting text to handwriting. t

8 Nov 19, 2022
A post-processing tool for scanned sheets of paper.

unpaper Originally written by Jens Gulden — see AUTHORS for more information. Licensed under GNU GPL v2 — see COPYING for more information. Overview u

27 Dec 07, 2022
A tensorflow implementation of EAST text detector

EAST: An Efficient and Accurate Scene Text Detector Introduction This is a tensorflow re-implementation of EAST: An Efficient and Accurate Scene Text

2.9k Jan 02, 2023
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

Břetislav Hájek 562 Jan 03, 2023
Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

428 Nov 22, 2022
A simple python program to record security cam footage by detecting a face and body of a person in the frame.

SecurityCam A simple python program to record security cam footage by detecting a face and body of a person in the frame. This code was created by me,

1 Nov 08, 2021
An organized collection of tutorials and projects created for aspriring computer vision students.

A repository created with the purpose of teaching students in BME lab 308A- Hanoi University of Science and Technology

Givralnguyen 5 Nov 24, 2021
This repository contains codes on how to handle mouse event using OpenCV

Handling-Mouse-Click-Events-Using-OpenCV This repository contains codes on how t

Happy N. Monday 3 Feb 15, 2022
A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

The project is based on older versions of tesseract and other tools, and is now superseded by another project which allows for more granular control o

Maxim 32 Jul 24, 2022
An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

FOTS: Fast Oriented Text Spotting with a Unified Network Introduction This is a pytorch re-implementation of FOTS: Fast Oriented Text Spotting with a

GeorgeJoe 171 Aug 04, 2022
code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
Histogram specification using openCV in python .

histogram specification using openCV in python . Have to input miu and sigma to draw gausssian distribution which will be used to map the input image . Example input can be miu = 128 sigma = 30

Tamzid hasan 6 Nov 17, 2021
Library used to deskew a scanned document

Deskew //Note: Skew is measured in degrees. Deskewing is a process whereby skew is removed by rotating an image by the same amount as its skew but in

Stéphane Brunner 273 Jan 06, 2023