Forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE

Overview

EAST_ICPR: EAST for ICPR MTWI 2018 CHALLENGE

Introduction

This is a repository forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE.
Origin Repository: argman/EAST - A tensorflow implementation of EAST text detector
Origin Author: argman

Author: Haozheng Li
Email: [email protected] or [email protected]

Contents

  1. Transform
  2. Models
  3. Demo
  4. Train
  5. Test
  6. Results

Transform

Some data in the dataset is abnormal, just like ICPR MTWI 2018[BaiduYun]. Abnormal means that the ground true labels are anticlockwise, or the images are not in 3 channels. Then errors like 'poly in wrong direction' will occur while using argman/EAST.

So I wrote a matlab program to check and transform the dataset. The program named <transform.m> is in the folder 'data_transform/' and its parameters are descripted as bellow:

icpr_img_folder = 'image_9000\';                   %origin images
icpr_txt_folder = 'txt_9000\';                     %origin ground true labels
icdar_img_folder = 'ICPR2018\';                    %transformed images
icdar_gt_folder = 'ICPR2018\';                     %transformed ground true labels
icdar_img_abnormal_folder = 'ICPR2018_abnormal\';  %images not in 3 channels, which give errors in argman/EAST
icdar_gt_abnormal_folder = 'ICPR2018_abnormal\';   %transformed ground true labels

%images and ground true labels files must be renamed as <img_1>, <img_2>, ..., <img_xxx> while using argman/EAST
first_index =  0;                                  %first index of the dataset
transform_list_name = 'transform_list.txt';        %file name of the rename list

Note: For abnormal images not in 3 channels, please transform them to normal ones through other tools like Format Factory. Then add the right data to the <icdar_img_folder> and <icdar_gt_folder>, so finally you get a whole normal dataset which has been checked and transformed.

Models

  1. Resnet_V1_50 Models trained on ICPR MTWI 2018 (train): [100k steps] [500k steps] [1035k steps]
  2. Resnet_V1_101 Models trained on ICPR MTWI 2018 (train) + ICDAR 2017 MLT (train + val) + RCTW-17 (train): [100k steps]
  3. Resnet_V1_101 Models pre-trained on Models-2, then trained on just ICPR MTWI 2018 (train): [987k steps]
  4. (In argman/EAST) Resnet_V1_50 Models trained on ICDAR 2013 (train) + ICDAR 2015 (train): [50k steps]
  5. (In argman/EAST) Resnet_V1_50 Models provided by tensorflow slim: [slim_resnet_v1_50]

Demo

Download the pre-trained models and run:

python run_demo_server.py --checkpoint-path models/east_icpr2018_resnet_v1_50_rbox_1035k/

Then Open http://localhost:8769 for the web demo server, or get the results in 'static/results/'.
Note: See argman/EAST#demo for more details.

Train

Prepare the training set and run:

python multigpu_train.py --gpu_list=0 --input_size=512 --batch_size_per_gpu=8 \
--checkpoint_path=models/east_icpr2018_resnet_v1_50_rbox/ \
--text_scale=512 --training_data_path=data/ICPR2018/ --geometry=RBOX \
--learning_rate=0.0001 --num_readers=18 --max_steps=50000

Note 1: Images and ground true labels files must be renamed as <img_1>, <img_2>, ..., <img_xxx> while using argman/EAST. Please see the examples in the folder 'training_samples/'.
Note 2: If --restore=True, training will restore from checkpoint and ignore the --pretrained_model_path. If --restore=False, training will delete checkpoint and initialize with the --pretrained_model_path (if exists).
Note 3: See argman/EAST#train for more details.

Test

Names of the images in ICPR MTWI 2018 are abnormal. Like <LB1gXi2JVXXXXXUXFXXXXXXXXXX.jpg> but not <img_10001.jpg>. Then errors will occur while using argman/EAST#test.

So I wrote two matlab programs to rename and inversely rename the dataset. Before evaluating, run the program named <rename.m> to make names of the images normal. This program is in the folder 'data_transform/' and its parameters are descripted as bellow:

icpr_img_folder = 'image_10000\';                      %origin images
icdar_img_folder = 'ICPR2018_test\';                   %transformed images
icdar_img_abnormal_folder = 'ICPR2018_test_abnormal\'; %images not in 3 channels, which give errors in argman/EAST

icpr_count =  10000;                                   %first index of the dataset
rename_list_name = 'rename_list.txt';                  %file name of the rename list

Note: Just like <transform.m>, please transform abnormal images through other tools like Format Factory.

After you have prepared the test set, run:

python eval.py --test_data_path=data/ICPR2018/ --gpu_list=0 \
--checkpoint_path=models/east_icpr2018_resnet_v1_50_rbox_1035k/ --output_dir=results/1035k/

Then get the results in 'results/'.

Finally inversely rename the result labels files from <img_10001.txt> to <LB1gXi2JVXXXXXUXFXXXXXXXXXX.txt> according to the rename list generated by <rename.m>. Run the program named <rename_inverse.m> which is in the folder 'data_transform/' and its parameters are descripted as bellow:

rename_list_name = 'rename_list.txt';  %file name of the rename list
icpr_img_folder = 'image_10000\';      %origin images
icpr_txt_folder = 'results\';          %result labels files generated by argman/EAST
icdar_gt_folder = 'txt_10000\';        %inversely renamed result labels files

Then zip the results in 'txt_10000/' and submit it to the ICPR MTWI 2018 CHALLENGE.

Results

Finally our model <east_icpr2018_resnet_v1_50_rbox_1035k> rank 31 in the ICPR MTWI 2018 CHALLENGE:
image

Here are some results on ICPR MTWI 2018:
image
image
image
image
image

Have fun!! :)

Owner
Haozheng Li
Computer Vision, Reinforcement Learning
Haozheng Li
A simple QR-Code Reader in Python

A simple QR-Code Reader written in Python, that copies the content of a QR-Code directly into the copy clipboard.

Eric 1 Oct 28, 2021
A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database.

A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database. The structure, shape and proportions of the faces are comp

Pavankumar Khot 4 Mar 19, 2022
This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vectors.

Vectorizing color range This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vect

Development Seed 9 Jul 27, 2022
Can We Find Neurons that Cause Unrealistic Images in Deep Generative Networks?

Can We Find Neurons that Cause Unrealistic Images in Deep Generative Networks? Artifact Detection/Correction - Offcial PyTorch Implementation This rep

CHOI HWAN IL 23 Dec 20, 2022
Solution for Problem 1 by team codesquad for AIDL 2020. Uses ML Kit for OCR and OpenCV for image processing

CodeSquad PS1 Solution for Problem Statement 1 for AIDL 2020 conducted by @unifynd technologies. Problem Given images of bills/invoices, the task was

Burhanuddin Udaipurwala 111 Nov 27, 2022
A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

NOTES: To ensure best results, make sure you are running this on a computer that has decent specs. 1920x1080 fullscreen is required in League, game mu

francis 125 Dec 30, 2022
Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera.

Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera. Fingertip location is mapped to RGB images to control the mouse cursor.

Ravi Sharma 71 Dec 20, 2022
Official PyTorch implementation for "Mixed supervision for surface-defect detection: from weakly to fully supervised learning"

Mixed supervision for surface-defect detection: from weakly to fully supervised learning [Computers in Industry 2021] Official PyTorch implementation

ViCoS Lab 169 Dec 30, 2022
Virtual Zoom Gesture using OpenCV

Virtual_Zoom_Gesture I have created a virtual zoom gesture where we can Zoom in and Zoom out any image and even we can move that image anywhere on the

Mudit Sinha 2 Dec 26, 2021
Rest API Written In Python To Classify NSFW Images.

✨ NSFW Classifier API ✨ Rest API Written In Python To Classify NSFW Images. Fastest Solution If you don't want to selfhost it, there's already an inst

Akshay Rajput 23 Dec 30, 2022
Satoshi is a discord bot template in python using discord.py that allow you to track some live crypto prices with your own discord bot.

Satoshi ~ DiscordCryptoBot Satoshi is a simple python discord bot using discord.py that allow you to track your favorites cryptos prices with your own

Théo 2 Sep 15, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 04, 2023
This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the robots of the future.

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the

Elkin Javier Guerra Galeano 17 Nov 03, 2022
Line based ATR Engine based on OCRopy

OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated

948 Dec 23, 2022
A python programusing Tkinter graphics library to randomize questions and answers contained in text files

RaffleOfQuestions Um programa simples em python, utilizando a biblioteca gráfica Tkinter para randomizar perguntas e respostas contidas em arquivos de

Gabriel Ferreira Rodrigues 1 Dec 16, 2021
Script para controlar o movimento do mouse usando Python e openCV com câmera em tempo real que detecta pontos de referência da mão, rastreia padrões de gestos em vez de um mouse físico.

mouserController Script para controlar o movimento do mouse usando Python e openCV com câmera em tempo real que detecta pontos de referência da mão, r

Vinícius Azevedo 6 Jun 28, 2022
An advanced 2D image manipulation with features such as edge detection and image segmentation built using OpenCV

OpenCV-ToothPaint3-Advanced-Digital-Image-Editor This application named ‘Tooth Paint’ version TP_2020.3 (64-bit) or version 3 was developed within a w

JunHong 1 Nov 05, 2021
Pytorch implementation of PSEnet with Pyramid Attention Network as feature extractor

Scene Text-Spotting based on PSEnet+CRNN Pytorch implementation of an end to end Text-Spotter with a PSEnet text detector and CRNN text recognizer. We

azhar shaikh 62 Oct 10, 2022
Some bits of javascript to transcribe scanned pages using PageXML

nashi (nasḫī) Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's m

Andreas Büttner 15 Nov 09, 2022
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

162 Jan 05, 2023