Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Overview

Handwritten Line Text Recognition using Deep Learning with Tensorflow

Description

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train. More read this Medium Post

Why Deep Learning?

Why Deep Learning

Deep Learning self extracts features with a deep neural networks and classify itself. Compare to traditional Algorithms it performance increase with Amount of Data.

Basic Intuition on How it Works.

Step_wise_detail

  • First Use Convolutional Recurrent Neural Network to extract the important features from the handwritten line text Image.
  • The output before CNN FC layer (512x100x8) is passed to the BLSTM which is for sequence dependency and time-sequence operations.
  • Then CTC LOSS Alex Graves is used to train the RNN which eliminate the Alignment problem in Handwritten, since handwritten have different alignment of every writers. We just gave the what is written in the image (Ground Truth Text) and BLSTM output, then it calculates loss simply as -log("gtText"); aim to minimize negative maximum likelihood path.
  • Finally CTC finds out the possible paths from the given labels. Loss is given by for (X,Y) pair is: Ctc_Loss
  • Finally CTC Decode is used to decode the output during Prediction.

Detail Project Workflow

Architecture of Model

  • Project consists of Three steps:
    1. Multi-scale feature Extraction --> Convolutional Neural Network 7 Layers
    2. Sequence Labeling (BLSTM-CTC) --> Recurrent Neural Network (2 layers of LSTM) with CTC
    3. Transcription --> Decoding the output of the RNN (CTC decode) DetailModelArchitecture

Requirements

  1. Tensorflow 1.8.0
  2. Flask
  3. Numpy
  4. OpenCv 3
  5. Spell Checker autocorrect >=0.3.0 pip install autocorrect

Dataset Used

  • IAM dataset download from here
  • Only needed the lines images and lines.txt (ASCII).
  • Place the downloaded files inside data directory
The Trained model is available and download from this link. The trained model CER=8.32% and trained on IAM dataset with some additional created dataset.

To Train the model from scratch

$ python main.py --train

To validate the model

$ python main.py --validate

To Prediction

$ python main.py

Run in Web with Flask

$ python upload.py
Validation character error rate of saved model: 8.654728%
Python: 3.6.4 
Tensorflow: 1.8.0
Init with stored values from ../model/snapshot-24
Without Correction clothed leaf by leaf with the dioappoistmest
With Correction clothed leaf by leaf with the dioappoistmest

Prediction output on IAM Test Data PredictionOutput

Prediction output on Self Test Data PredictionOutput

See the project Devnagari Handwritten Word Recognition with Deep Learning for more insights.

Further Improvement

  • Using MDLSTM to recognize whole paragraph at once Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention
  • Line segementation can be added for full paragraph text recognition. For line segmentation you can use A* path planning algorithm or CNN model to seperate paragraph into lines.
  • Better Image preprocessing such as: reduce backgoround noise to handle real time image more accurately.
  • Better Decoding approach to improve accuracy. Some of the CTC Decoder found here

Feel Free to improve this project with pull Request.

This is part of my last semester project of Computer Engineering From Tribhuvan University. July 2019

Owner
sushant097
Machine Learning Engineer | Computer Vision Developer. Working in the field of Research, development of Machine learning and Computer Vision .
sushant097
Some bits of javascript to transcribe scanned pages using PageXML

nashi (nasḫī) Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's m

Andreas Büttner 15 Nov 09, 2022
Introduction to Augmented Reality (AR) with Python 3 and OpenCV 4.2.

Introduction to Augmented Reality (AR) with Python 3 and OpenCV 4.2.

fernanda rodríguez 85 Jan 02, 2023
Perspective recovery of text using transformed ellipses

unproject_text Perspective recovery of text using transformed ellipses. See full writeup at https://mzucker.github.io/2016/10/11/unprojecting-text-wit

Matt Zucker 111 Nov 13, 2022
How to detect objects in real time by using Jupyter Notebook and Neural Networks , by using Yolo3

Real Time Object Recognition From your Screen Desktop . In this post, I will explain how to build a simply program to detect objects from you desktop

Ruslan Magana Vsevolodovna 2 Sep 28, 2022
Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti

Daniel Soares Saldanha 2 Oct 11, 2021
Balabobapy - Using artificial intelligence algorithms to continue the text

Balabobapy - Using artificial intelligence algorithms to continue the text

qxtony 1 Feb 04, 2022
A Python script to capture images from multiple webcams at once and save them into your local machine

Capturing multiple images at once from Webcam Using OpenCV Capture multiple image by accessing the webcam of your system and save it to your machine.

Fazal ur Rehman 2 Apr 16, 2022
A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

DcoumentScanner A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV. Directly install the .exe file to inst

Harsh Vardhan Singh 1 Oct 29, 2021
~1000 book pages + OpenCV + python = page regions identified as paragraphs, lines, images, captions, etc.

cosc428-structor I had an open-ended Computer Vision assignment to complete, and an out-of-copyright book that I wanted to turn into an ebook. Convent

Chad Oliver 45 Dec 06, 2022
Image Smoothing and Blurring Using OpenCV

Image-Smoothing-and-Blurring-Using-OpenCV This repository contains codes for performing image smoothing and blurring using OpenCV. There are different

Happy N. Monday 3 Feb 15, 2022
This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

SCUT-CTW1500 Datasets We have updated annotations for both train and test set. Train: 1000 images [images][annos] Additional point annotation for each

Yuliang Liu 600 Dec 18, 2022
Pre-Recognize Library - library with algorithms for improving OCR quality.

PRLib - Pre-Recognition Library. The main aim of the library - prepare image for recogntion. Image processing can really help to improve recognition q

Alex 80 Dec 30, 2022
TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Introduction The code and trained models of: TextField: Learning A Deep

Yukang Wang 101 Dec 12, 2022
LEARN OPENCV IN 3 HOURS USING PYTHON - INCLUDING EXAMPLE PROJECTS

LEARN OPENCV IN 3 HOURS USING PYTHON - INCLUDING EXAMPLE PROJECTS

Murtaza Hassan 815 Dec 29, 2022
PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

PyNeuro PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application

Zach Wang 45 Dec 30, 2022
Detect the mathematical formula from the given picture and the same formula is extracted and converted into the latex code

Mathematical formulae extractor The goal of this project is to create a learning based system that takes an image of a math formula and returns corres

6 May 22, 2022
Text language identification using Wikipedia data

Text language identification using Wikipedia data The aim of this project is to provide high-quality language detection over all the web's languages.

Vsevolod Dyomkin 28 Jul 09, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

PRImA Research Lab 46 Nov 14, 2022