TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

Overview

FOTS: Fast Oriented Text Spotting with a Unified Network

I am still working on this repo. updates and detailed instructions are coming soon!

Table of Contens

TensorFlow Versions

As for now, the pre-training code is tested on TensorFlow 1.12, 1.14 and 1.15. I may try to implement 2.x version in the future.

Other Requirements

GCC >= 6

Trained Models

Datasets

Train

Pre-train with SynthText

  1. Download pre-trained ResNet-50 from TensorFlow-Slim image classification model library page and place it at 'ckpt/resnet_v1_50' dir.
cd ckpt/resnet_v1_50
wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
tar -zxvf resnet_v1_50_2016_08_28.tar.gz
rm resnet_v1_50_2016_08_28.tar.gz
  1. Download Synth800k dataset and place it at data/SynthText/ dir to pre-train the whole net.

  2. Transform(Pre-process) the SynthText data into the ICDAR data format.

python data_provider/SynthText2ICDAR.py
  1. Train with SynthText for 10 epochs(with 1 GPU).
python train.py \
  --max_steps=715625 \
  --gpu_list='0' \
  --checkpoint_path=ckpt/synthText_10eps/ \
  --pretrained_model_path=ckpt/resnet_v1_50/resnet_v1_50.ckpt \
  --training_img_data_dir=data/SynthText/ \
  --training_gt_data_dir=data/SynthText/ \
  --icdar=False \
  1. Visualize pre-pretraining progress with TensorBoard.
tensorboard --logdir=ckpt/synthText_10eps/

Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013

(if you are using the pre-trained model, place all of the files in ckpt/synthText_10eps/)

  • Combine ICDAR data before training.

    1. Place ICDAR data under tmp/ foler.
    2. Run the following script to combine the data.
    python combine_ICDAR_data.py --year [year of ICDAR to train(13 or 15 or 17)]
    
  • ICDAR 2017 MLT/pre-finetune for ICDAR 2013 or ICDAR 2015 (text detection task only)

    • Train the pre-trained model with 9,000 images from ICDAR 2017 MLT training and validation datasets(with 1 GPU).
    python train.py \
      --gpu_list='0' \
      --checkpoint_path=ckpt/ICDAR17MLT/ \
      --pretrained_model_path=ckpt/synthText_10eps/ \
      --train_stage=0 \
      --training_img_data_dir=data/ICDAR17MLT/imgs/ \
      --training_gt_data_dir=data/ICDAR17MLT/gts/
    
  • ICDAR 2015

    • Train the model with 1,000 images from ICDAR 2015 training dataset and 229 images from ICDAR 2013 training datasets(with 1 GPU).
    python train.py \
      --gpu_list='0' \
      --checkpoint_path=ckpt/ICDAR15/ \
      --pretrained_model_path=ckpt/ICDAR17MLT/ \
      --training_img_data_dir=data/ICDAR15+13/imgs/ \
      --training_gt_data_dir=data/ICDAR15+13/gts/
    
  • ICDAR 2013(horizontal text only)

    • Train the model with 229 images from ICDAR 2013 training datasets(with 1 GPU).
    python train.py \
      --gpu_list='0' \
      --checkpoint_path=ckpt/ICDAR13/ \
      --pretrained_model_path=ckpt/ICDAR17MLT/ \
      --training_img_data_dir=data/ICDAR13/imgs/ \
      --training_gt_data_dir=data/ICDAR13/gts/
    

Test

Place some images in test_imgs/ dir and specify a trained checkpoint path to see the test result.

python test.py --test_data_path test_imgs/ --checkpoint_path [checkpoint path]

References

Owner
Masao Taketani
Deep Learning research engineer, currently working in Tokyo, Japan. An ex-boxer, who is highly motivated to train one's mind and body.
Masao Taketani
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

Table of Contents Overview Requirements Demo Modules Overview This python package contains modules to help with finding and extracting tabular data fr

Eric Ihli 311 Dec 24, 2022
Character Segmentation using TensorFlow

Character Segmentation Segment characters and spaces in one text line,from this paper Chinese English mixed Character Segmentation as Semantic Segment

26 Aug 25, 2022
Lightning Fast Language Prediction 🚀

whatthelang Lightning Fast Language Prediction 🚀 Dependencies The dependencies can be installed using the requirements.txt file: $ pip install -r req

Indix 152 Oct 16, 2022
Simple SDF mesh generation in Python

Generate 3D meshes based on SDFs (signed distance functions) with a dirt simple Python API.

Michael Fogleman 1.1k Jan 08, 2023
Using computer vision method to recognize and calcutate the features of the architecture.

building-feature-recognition In this repository, we accomplished building feature recognition using traditional/dl-assisted computer vision method. Th

4 Aug 11, 2022
EQFace: An implementation of EQFace: A Simple Explicit Quality Network for Face Recognition

EQFace: A Simple Explicit Quality Network for Face Recognition The first face recognition network that generates explicit face quality online.

DeepCam Shenzhen 141 Dec 31, 2022
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022
A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports

A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports "with"-syntax.

Patrice Matz 0 Oct 30, 2021
Text to QR-CODE

QR CODE GENERATO USING PYTHON Author : RAFIK BOUDALIA. Installation Use the package manager pip to install foobar. pip install pyqrcode Usage from tki

Rafik Boudalia 2 Oct 13, 2021
[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Grounded Situation Recognition with Transformers Paper | Model Checkpoint This is the official PyTorch implementation of Grounded Situation Recognitio

Junhyeong Cho 18 Jul 19, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

68 Dec 14, 2022
Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

SSTDNet Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch. This code is work for general object detecti

HotaekHan 84 Jan 05, 2022
This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image

Meta Research 840 Dec 26, 2022
Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

4 Nov 06, 2022
Um simples projeto para fazer o reconhecimento do captcha usado pelo jogo bombcrypto

CaptchaSolver - LEIA ISSO 😓 Para iniciar o codigo: pip install -r requirements.txt python captcha_solver.py Se você deseja pegar ver o resultado das

Kawanderson 50 Mar 21, 2022
textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

An End-to-End TextSpotter with Explicit Alignment and Attention This is initially described in our CVPR 2018 paper. Getting Started Installation Clone

Tong He 323 Nov 10, 2022
An expandable and scalable OCR pipeline

Overview Nidaba is the central controller for the entire OGL OCR pipeline. It oversees and automates the process of converting raw images into citable

81 Jan 04, 2023
Using Opencv ,based on Augmental Reality(AR) and will show the feature matching of image and then by finding its matching

Using Opencv ,this project is based on Augmental Reality(AR) and will show the feature matching of image and then by finding its matching ,it will just mask that image . This project ,if used in cctv

1 Feb 13, 2022
Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti

Daniel Soares Saldanha 2 Oct 11, 2021
Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

CV-Virtual-WhiteBoard The Virtual WhiteBoard is a project I made using the OpenCV and Mediapipe Python libraries. Using your index and middle finger y

Stephen Wang 1 Jan 07, 2022