CNN+LSTM+CTC based OCR implemented using tensorflow.

Last update: Dec 08, 2022

Overview

CNN_LSTM_CTC_Tensorflow

CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow.

Note: there is No restriction on the number of characters in the image (variable length). Have a look at the image bellow.

I trained a model with 100k images using this code and got 99.75% accuracy on test dataset (200k images) in the competition. The images in both dataset:

Update 2017.11.6:

The competiton page is not available now, if you want to reproduce this result, please see this issue about dataset， the lable file (a .txt file) is in the same folder with images after extracting .tar.gz file.

Update 2018.4.24:

Update to tensorflow 1.7 and fix some bugs reported at issue #8.

Structure

The images are first processed by a CNN to extract features, then these extracted features are fed into a LSTM for character recognition.

The architecture of CNN is just Convolution + Batch Normalization + Leaky Relu + Max Pooling for simplicity, and the LSTM is a 2 layers stacked LSTM, you can also try out Bidirectional LSTM.

You can play with the network architecture (add dropout to CNN, stacked layers of LSTM etc.) and see what will happen. Have a look at CNN part and LSTM part.

Prerequisite

Python 3.6.4
TensorFlow 1.2
Opencv3 (Not a must, used to read images).

How to run

There are many other parameters with which you can play, have a look at utils.py.

Note that the num_classes is not added to parameters talked above for clarification.

# cd to the your workspace.
# The code will evaluate the accuracy every validation_steps specified in parameters.

ls -R
  .:
  imgs  utils.py  helper.py  main.py  cnn_lstm_otc_ocr.py

  ./imgs:
  train  infer  val  labels.txt
  
  ./imgs/train:
  1.png  2.png  ...  50000.png
  
  ./imgs/val:
  1.png  2.png  ...  50000.png

  ./imgs/infer:
  1.png  2.png  ...  300000.png
   
  
# Train the model.
CUDA_VISIBLE_DEVICES=0 python ./main.py --train_dir=../imgs/train/ \
  --val_dir=../imgs/val/ \
  --image_height=60 \
  --image_width=180 \
  --image_channel=1 \
  --out_channels=64 \
  --num_hidden=128 \
  --batch_size=128 \
  --log_dir=./log/train \
  --num_gpus=1 \
  --mode=train

# Inference
CUDA_VISIBLE_DEVICES=0 python ./main.py --infer_dir=./imgs/infer/ \
  --checkpoint_dir=./checkpoint/ \
  --num_gpus=0 \
  --mode=infer

Run with your own data.

Prepare your data, make sure that all images are named in format: id_label.jpg, e.g: 004_(1+4)*2.jpg.

# make sure the data path is correct, have a look at helper.py.

python helper.py

Run following How to run

CNN+LSTM+CTC based OCR implemented using tensorflow.

Related tags

Overview

CNN_LSTM_CTC_Tensorflow

Structure

Prerequisite

How to run

Run with your own data.

Owner

Watson Yang

Msos searcher - A half-hearted attempt at finding a magic square of squares

This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

Image augmentation for machine learning experiments.

Détection de créneaux de vaccination disponibles pour l'outil ViteMaDose

Python tool that takes the OCR.space JSON output as input and draws a text overlay on top of the image.

This is used to convert a string to an Image with Handwritten Characters.

PianoVisuals - Create background videos synced with piano music using opencv

Binarize document images

An Optical Character Recognition system using Pytesseract/Extracting data from Blood Pressure Reports.

This can be use to convert text in a file to handwritten text.

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

Document Layout Analysis Projects

基于Paddle框架的PSENet复现

TedEval: A Fair Evaluation Metric for Scene Text Detectors

An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

Fun program to overlay a mask to yourself using a webcam

Image augmentation library in Python for machine learning.

CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

Shape Detection - It's a shape detection project with OpenCV and Python.