TextBoxes++: A Single-Shot Oriented Scene Text Detector

Overview

TextBoxes++: A Single-Shot Oriented Scene Text Detector

Introduction

This is an application for scene text detection (TextBoxes++) and recognition (CRNN).

TextBoxes++ is a unified framework for oriented scene text detection with a single network. It is an extended work of TextBoxes. CRNN is an open-source text recognizer. The code of TextBoxes++ is based on SSD and TextBoxes. The code of CRNN is modified from CRNN.

For more details, please refer to our arXiv paper.

Citing the related works

Please cite the related works in your publications if it helps your research:

@article{Liao2018Text,
  title = {{TextBoxes++}: A Single-Shot Oriented Scene Text Detector},
  author = {Minghui Liao, Baoguang Shi and Xiang Bai},
  journal = {{IEEE} Transactions on Image Processing},
  doi  = {10.1109/TIP.2018.2825107},
  url = {https://doi.org/10.1109/TIP.2018.2825107},
  volume = {27},
  number = {8},
  pages = {3676--3690},
  year = {2018}
}

@inproceedings{LiaoSBWL17,
  author    = {Minghui Liao and
               Baoguang Shi and
               Xiang Bai and
               Xinggang Wang and
               Wenyu Liu},
  title     = {TextBoxes: {A} Fast Text Detector with a Single Deep Neural Network},
  booktitle = {AAAI},
  year      = {2017}
}

@article{ShiBY17,
  author    = {Baoguang Shi and
               Xiang Bai and
               Cong Yao},
  title     = {An End-to-End Trainable Neural Network for Image-Based Sequence Recognition
               and Its Application to Scene Text Recognition},
  journal   = {{IEEE} TPAMI},
  volume    = {39},
  number    = {11},
  pages     = {2298--2304},
  year      = {2017}
}

Contents

  1. Requirements
  2. Installation
  3. Docker
  4. Models
  5. Demo
  6. Train

Requirements

NOTE There is partial support for a docker image. See docker/README.md. (Thank you for the PR from @mdbenito)

Torch7 for CRNN; 
g++-5; cuda8.0; cudnn V5.1 (cudnn 6 and cudnn 7 may fail); opencv3.0

Please refer to Caffe Installation to ensure other dependencies;

Installation

  1. compile TextBoxes++ (This is a modified version of caffe so you do not need to install the official caffe)
# Modify Makefile.config according to your Caffe installation.
cp Makefile.config.example Makefile.config
make -j8
# Make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make py
  1. compile CRNN (Please refer to CRNN if you have trouble with the compilation.)
cd crnn/src/
sh build_cpp.sh

Docker

(Thanks for the PR from @idotobi)

Build Docke Image

docker build -t tbpp_crnn:gpu .

This can take +1h, so go get a coffee ;)

Once this is done you can start a container via nvidia-docker.

nvidia-docker run -it --rm tbpp_crnn:gpu bash

To check if the GPU is available inside the docker container you can run nvidia-smi.

It's recommendable to mount the ./models and ./crnn/model/ directories to include the downloaded models.

nvidia-docker run -it \
                  --rm \
                  -v ${PWD}/models:/opt/caffe/models \ 
                  -v ${PWD}/crrn/model:/opt/caffe/crrn/model \
                  tbpp_crnn:gpu bash

For convenince this command is executed when running ./run.bash.

Models

  1. pre-trained model on SynthText (used for training): Dropbox; BaiduYun

  2. model trained on ICDAR 2015 Incidental Text (used for testing): Dropbox; BaiduYun

    Please place the above models in "./models/"

    If your data is hugely different from ICDAR 2015 Incidental Text,you'd better train it on your own data based on the pre-trained model on SynthText.

  3. CRNN model: Dropbox; BaiduYun

    Please place the crnn model in "./crnn/model/"

Demo

Download the ICDAR 2015 model and place it in "./models/"

python examples/text/demo.py

The detection results and recognition results are in "./demo_images"

Train

Create lmdb data

  1. convert ground truth into "xml" form: example.xml

  2. create train/test lists (train.txt / test.txt) in "./data/text/" with the following form:

     path_to_example1.jpg path_to_example1.xml
     path_to_example2.jpg path_to_example2.xml
    
  3. Run "./data/text/creat_data.sh"

Start training

1. modify the lmdb path in modelConfig.py
2. Run "python examples/text/train.py"
Owner
Minghui Liao
Minghui Liao, a Ph.D. student of Huazhong University of Science and Technology.
Minghui Liao
Ddddocr - 通用验证码识别OCR pypi版

带带弟弟OCR通用验证码识别SDK免费开源版 今天ddddocr又更新啦! 当前版本为1.3.1 想必很多做验证码的新手,一定头疼碰到点选类型的图像,做样本费时

Sml2h3 4.4k Dec 31, 2022
Simple app for visual editing of Page XML files

Name nw-page-editor - Simple app for visual editing of Page XML files. Version: 2021.02.22 Description nw-page-editor is an application for viewing/ed

Mauricio Villegas 27 Jun 20, 2022
CNN+LSTM+CTC based OCR implemented using tensorflow.

CNN_LSTM_CTC_Tensorflow CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow. Note: there is No restriction on the numbe

Watson Yang 356 Dec 08, 2022
Read Japanese manga inside browser with selectable text.

mokuro Read Japanese manga with selectable text inside a browser. See demo: https://kha-white.github.io/manga-demo mokuro_demo.mp4 Demo contains excer

Maciej Budyś 170 Dec 27, 2022
The open source extract transaction infomation by using OCR.

Transaction OCR Mã nguồn trích xuất thông tin transaction từ file scaned pdf, ở đây tôi lựa chọn tài liệu sao kê công khai của Thuy Tien. Mã nguồn có

Nguyen Xuan Hung 18 Jun 02, 2022
Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

Thang Vu 231 Dec 27, 2022
The CIS OCR PostCorrectionTool

The CIS OCR Post Correction Tool PoCoTo Source code for the Java-based PoCoTo client enabling fast interactive batch corrections of complete OCR error

CIS OCR Group 36 Dec 15, 2022
OpenGait is a flexible and extensible gait recognition project

A flexible and extensible framework for gait recognition. You can focus on designing your own models and comparing with state-of-the-arts easily with the help of OpenGait.

Shiqi Yu 335 Dec 22, 2022
Kornia is a open source differentiable computer vision library for PyTorch.

Open Source Differentiable Computer Vision Library

kornia 7.6k Jan 06, 2023
Apply different text recognition services to images of handwritten documents.

Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume

Caltech Library 117 Jan 02, 2023
This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images.

Welcome This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images. Installation There are curren

8 Jul 29, 2022
Program created with opencv that allows you to automatically count your repetitions on several fitness exercises.

Virtual partner of gym Description Program created with opencv that allows you to automatically count your repetitions on several fitness exercises li

1 Jan 04, 2022
Crop regions in napari manually

napari-crop Crop regions in napari manually Usage Create a new shapes layer to annotate the region you would like to crop: Use the rectangle tool to a

Robert Haase 4 Sep 29, 2022
SemTorch

SemTorch This repository contains different deep learning architectures definitions that can be applied to image segmentation. All the architectures a

David Lacalle Castillo 154 Dec 07, 2022
PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

PyNeuro PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application

Zach Wang 45 Dec 30, 2022
Forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE

EAST_ICPR: EAST for ICPR MTWI 2018 CHALLENGE Introduction This is a repository forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE. Origin Reposi

Haozheng Li 157 Aug 23, 2022
Automatically remove the mosaics in images and videos, or add mosaics to them.

Automatically remove the mosaics in images and videos, or add mosaics to them.

Hypo 1.4k Dec 30, 2022
CellProfiler is a open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automaticall

CellProfiler 732 Dec 23, 2022
一款基于Qt与OpenCV的仿真数字示波器

一款基于Qt与OpenCV的仿真数字示波器

郭赟 4 Nov 02, 2022
python ocr using tesseract/ with EAST opencv detector

pytextractor python ocr using tesseract/ with EAST opencv text detector Uses the EAST opencv detector defined here with pytesseract to extract text(de

Danny Crasto 38 Dec 05, 2022