Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Last update: Jan 03, 2023

Related tags

Deep Learning SwinTextSpotter

Overview

SwinTextSpotter

This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022). The paper is available at this link.

We use the models pre-trained on ImageNet. The ImageNet pre-trained SwinTransformer backbone is obtained from SwinT_detectron2.

Models

SWINTS-swin-english-pretrain [config] | model_Google Drive | model_BaiduYun PW: 954t

SWINTS-swin-Total-Text [config] | model_Google Drive | model_BaiduYun PW: tf0i

SWINTS-swin-ctw [config] | model_Google Drive | model_BaiduYun PW: 4etq

SWINTS-swin-icdar2015 [config] | model_Google Drive | model_BaiduYun PW: 3n82

SWINTS-swin-ReCTS [config] | model_Google Drive | model_BaiduYun PW: a4be

SWINTS-swin-vintext [config] | model_Google Drive | model_BaiduYun PW: slmp

Installation

Python=3.8
PyTorch=1.8.0, torchvision=0.9.0, cudatoolkit=11.1
OpenCV for visualization

Steps

Install the repository (we recommend to use Anaconda for installation.)

conda create -n SWINTS python=3.8 -y
conda activate SWINTS
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install opencv-python
pip install scipy
pip install shapely
pip install rapidfuzz
pip install timm
pip install Polygon3
git clone https://github.com/mxin262/SwinTextSpotter.git
cd SwinTextSpotter
python setup.py build develop

dataset path

datasets
|_ totaltext
|  |_ train_images
|  |_ test_images
|  |_ totaltext_train.json
|  |_ weak_voc_new.txt
|  |_ weak_voc_pair_list.txt
|_ mlt2017
|  |_ train_images
|  |_ annotations/icdar_2017_mlt.json
.......

Downloaded images

ICDAR2017-MLT [image]
Syntext-150k:
- Part1: 94,723 [dataset]
- Part2: 54,327 [dataset]
ICDAR2015 [image]
ICDAR2013 [image]
Total-Text_train_images [image]
Total-Text_test_images [image]
ReCTs [images&label] PW: 2b4q
LSVT [images&label] PW: 9uh1
ArT [images&label] PW: 2865
SynChinese130k [images][label]
Vintext_images [image]

Downloaded label[Google Drive] [BaiduYun] PW: 46vd

Downloader lexicion[Google Drive] and place it to corresponding dataset.

You can also prepare your custom dataset following the example scripts. [example scripts]

Totaltext

To evaluate on Total Text, CTW1500, ICDAR2015, first download the zipped annotations with

cd datasets
mkdir evaluation
cd evaluation
wget -O gt_ctw1500.zip https://cloudstor.aarnet.edu.au/plus/s/xU3yeM3GnidiSTr/download
wget -O gt_totaltext.zip https://cloudstor.aarnet.edu.au/plus/s/SFHvin8BLUM4cNd/download
wget -O gt_icdar2015.zip https://drive.google.com/file/d/1wrq_-qIyb_8dhYVlDzLZTTajQzbic82Z/view?usp=sharing
wget -O gt_vintext.zip https://drive.google.com/file/d/11lNH0uKfWJ7Wc74PGshWCOgSxgEnUPEV/view?usp=sharing

Pretrain SWINTS (e.g., with Swin-Transformer backbone)

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-pretrain.yaml

Fine-tune model on the mixed real dataset

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-mixtrain.yaml

Fine-tune model

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml

Evaluate SWINTS (e.g., with Swin-Transformer backbone)

python projects/SWINTS/train_net.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --eval-only MODEL.WEIGHTS ./output/model_final.pth

Visualize the detection and recognition results (e.g., with ResNet50 backbone)

python demo/demo.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --input input1.jpg \
  --output ./output \
  --confidence-threshold 0.4 \
  --opts MODEL.WEIGHTS ./output/model_final.pth

Example results:

Acknowlegement

Adelaidet, Detectron2, ISTR, SwinT_detectron2, Focal-Transformer and MaskTextSpotterV3.

Citation

If our paper helps your research, please cite it in your publications:

@article{huang2022swints,
  title = {SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition},
  author = {Mingxin Huang and YuLiang liu and Zhenghao Peng and Chongyu Liu and Dahua Lin and Shenggao Zhu and Nicholas Yuan and Kai Ding and Lianwen Jin},
  journal={arXiv preprint arXiv:2203.10209},
  year = {2022}
}

Copyright

For commercial purpose usage, please contact Dr. Lianwen Jin: [email protected]

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Related tags

Overview

SwinTextSpotter

Models

Installation

Steps

Totaltext

Example results:

Acknowlegement

Citation

Copyright

Owner

mxin262

Convolutional Neural Network to detect deforestation in the Amazon Rainforest

Visualizing lattice vibration information from phonon dispersion to atoms (For GPUMD)

NNR conformation conditional and global probabilities estimation and analysis in peptides or proteins fragments

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification"

Rust bindings for the C++ api of PyTorch.

Block Sparse movement pruning

atmaCup #11 の Public 4th / Pricvate 5th Solution のリポジトリです。

Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad to your characters in Modo.

Sum-Product Probabilistic Language

This is the official pytorch implementation of the BoxEL for the description logic EL++

A robust pointcloud registration pipeline based on correlation.

Vignette is a face tracking software for characters using osu!framework.

Code to reproduce results from the paper "AmbientGAN: Generative models from lossy measurements"

Few-shot NLP benchmark for unified, rigorous eval

Repository For Programmers Seeking a platform to show their skills

《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》(EMNLP 2020)

A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

DM-ACME compatible implementation of the Arm26 environment from Mujoco

A knowledge base construction engine for richly formatted data