A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Overview

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

A PyTorch implement of TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes (ECCV 2018) by Megvii

Paper

Comparison of different representations for text instances. (a) Axis-aligned rectangle. (b) Rotated rectangle. (c) Quadrangle. (d) TextSnake. Obviously, the proposed TextSnake representation is able to effectively and precisely describe the geometric properties, such as location, scale, and bending of curved text with perspective distortion, while the other representations (axis-aligned rectangle, rotated rectangle or quadrangle) struggle with giving accurate predictions in such cases.

Textsnake elements:

  • center point
  • tangent line
  • text region

Description

Generally, this code has following features:

  1. include complete training and inference code
  2. pure python version without extra compiling
  3. compatible with laste PyTorch version (write with pytroch 0.4.0)
  4. support TotalText and SynthText dataset

Getting Started

This repo includes the training code and inference demo of TextSnake, training and infercence can be simplely run with a few code.

Prerequisites

To run this repo successfully, it is highly recommanded with:

  • Linux (Ubuntu 16.04)
  • Python3.6
  • Anaconda3
  • NVIDIA GPU(with 8G or larger GPU memory for training, 2G for inference)

(I haven't test it on other Python version.)

  1. clone this repository
git clone https://github.com/princewang1994/TextSnake.pytorch.git
  1. python package can be installed with pip
$ cd $TEXTSNAKE_ROOT
$ pip install -r requirements.txt

Data preparation

Pretraining with SynthText

$ CUDA_VISIBLE_DEVICES=$GPUID python train.py synthtext_pretrain --dataset synth-text --viz --max_epoch 1 --batch_size 8

Training

Training model with given experiment name $EXPNAME

training from scratch:

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python train.py $EXPNAME --viz

training with pretrained model(improved performance much)

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python train.py example --viz --batch_size 8 --resume save/synthtext_pretrain/textsnake_vgg_0.pth

options:

  • exp_name: experiment name, used to identify different training processes
  • --viz: visualization toggle, output pictures are saved to ./vis by default

other options can be show by run python train.py -h

Running tests

Runing following command can generate demo on TotalText dataset (300 pictures), the result are save to ./vis by default

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python eval_textsnake.py $EXPNAME --checkepoch 190

options:

  • exp_name: experiment name, used to identify different training process

other options can be show by run python train.py -h

Evaluation

Total-Text metric is included in dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py, you should first modify the input_dir in Deteval.py and run following command for computing DetEval:

$ python dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py $EXPNAME --tr 0.8 --tp 0.4

or

$ python dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py $EXPNAME --tr 0.7 --tp 0.6

it will output metrics reports.

Pretrained Models

Download from links above and place pth file to the corresponding path(save/XXX/textsnake_vgg_XX.pth).

Performance

DetEval reporting

Following table reports DetEval metrics when we set vgg as the backbone(can be reproduced by using pertained model in Pretrained Model section):

tr=0.7 / tp=0.6(P|R|F1) tr=0.8 / tp=0.4(P|R|F1) FPS(On single 1080Ti)
expand / no merge 0.652 | 0.549 | 0.596 0.874 | 0.711 | 0.784 12.07
expand / merge 0.698 | 0.578 | 0.633 0.859 | 0.660 | 0.746 8.38
no expand / no merge 0.753 | 0.693 | 0.722 0.695 | 0.628 | 0.660 9.94
no expand / merge 0.747 | 0.677 | 0.710 0.691 | 0.602 | 0.643 11.05
reported on paper - 0.827 | 0.745 | 0.784

* expand denotes expanding radius by 0.3 times while post-processing

* merge denotes that merging overlapped instance while post-processing

Pure Inference

You can also run prediction on your own dataset without annotations:

  1. Download pretrained model and place .pth file to save/pretrained/textsnake_vgg_180.pth
  2. Run pure inference script as following:
$ EXPNAME=pretrained
$ CUDA_VISIBLE_DEVICES=$GPUID python demo.py $EXPNAME --checkepoch 180 --img_root /path/to/image

predicted result will be saved in output/$EXPNAME and visualization in vis/${EXPNAME}_deploy

Qualitative results

  • left: prediction/ground true
  • middle: text region(TR)
  • right: text center line(TCL)

What is comming

  • Pretraining with SynthText
  • Metric computing
  • Pretrained model upload
  • Pure inference script
  • More dataset suport: [ICDAR15, CTW1500]
  • Various backbone experiments

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgement

Owner
Prince Wang
I'm a CS graduate student from Zhejiang University
Prince Wang
How to detect objects in real time by using Jupyter Notebook and Neural Networks , by using Yolo3

Real Time Object Recognition From your Screen Desktop . In this post, I will explain how to build a simply program to detect objects from you desktop

Ruslan Magana Vsevolodovna 2 Sep 28, 2022
Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

Shangbang Long 763 Jan 01, 2023
Document manipulation detection with python

image manipulation detection task: -- tianchi function image segmentation salie

JiaKui Hu 3 Aug 22, 2022
ERQA - Edge Restoration Quality Assessment

ERQA - a full-reference quality metric designed to analyze how good image and video restoration methods (SR, deblurring, denoising, etc) are restoring real details.

MSU Video Group 27 Dec 17, 2022
Table Extraction Tool

Tree Structure - Table Extraction Fonduer has been successfully extended to perform information extraction from richly formatted data such as tables.

HazyResearch 88 Jun 02, 2022
Write-ups for the SwissHackingChallenge2021 CTF.

SwissHackingChallenge 2021 : Write-ups This repository contains a collection of my write-ups for challenges solved during the SwissHackingChallenge (S

Julien Béguin 3 Jun 07, 2021
CNN+Attention+Seq2Seq

Attention_OCR CNN+Attention+Seq2Seq The model and its tensor transformation are shown in the figure below It is necessary ch_ train and ch_ test the p

Tsukinousag1 2 Jul 14, 2022
Automatically remove the mosaics in images and videos, or add mosaics to them.

Automatically remove the mosaics in images and videos, or add mosaics to them.

Hypo 1.4k Dec 30, 2022
PianoVisuals - Create background videos synced with piano music using opencv

Steps Record piano video Use Neural Network to do body segmentation (video matti

Solbiati Alessandro 4 Jan 24, 2022
Read Japanese manga inside browser with selectable text.

mokuro Read Japanese manga with selectable text inside a browser. See demo: https://kha-white.github.io/manga-demo mokuro_demo.mp4 Demo contains excer

Maciej Budyś 170 Dec 27, 2022
OCR powered screen-capture tool to capture information instead of images

NormCap OCR powered screen-capture tool to capture information instead of images. Links: Repo | PyPi | Releases | Changelog | FAQs Content: Quickstart

575 Dec 31, 2022
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

EasyOCR Ready-to-use OCR with 80+ languages supported including Chinese, Japanese, Korean and Thai. What's new 1 February 2021 - Version 1.2.3 Add set

Jaided AI 16.7k Jan 03, 2023
An Optical Character Recognition system using Pytesseract/Extracting data from Blood Pressure Reports.

Optical_Character_Recognition An Optical Character Recognition system using Pytesseract/Extracting data from Blood Pressure Reports. As an IOT/Compute

Ramsis Hammadi 1 Feb 12, 2022
Learning Camera Localization via Dense Scene Matching, CVPR2021

This repository contains code of our CVPR 2021 paper - "Learning Camera Localization via Dense Scene Matching" by Shitao Tang, Chengzhou Tang, Rui Hua

tangshitao 65 Dec 01, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

SCUT-CTW1500 Datasets We have updated annotations for both train and test set. Train: 1000 images [images][annos] Additional point annotation for each

Yuliang Liu 600 Dec 18, 2022
A community-supported supercharged version of paperless: scan, index and archive all your physical documents

Paperless-ngx Paperless-ngx is a document management system that transforms your physical documents into a searchable online archive so you can keep,

5.2k Jan 04, 2023
SemTorch

SemTorch This repository contains different deep learning architectures definitions that can be applied to image segmentation. All the architectures a

David Lacalle Castillo 154 Dec 07, 2022
A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

FudanVIC Team 170 Dec 26, 2022
Here use convulation with sobel filter from scratch in opencv python .

Here use convulation with sobel filter from scratch in opencv python .

Tamzid hasan 2 Nov 11, 2021