Simple and understandable swin-transformer OCR project

Last update: Dec 31, 2022

Overview

swin-transformer-ocr

Overview

Simple and understandable swin-transformer OCR project. The model in this repository heavily relied on high-level open-source projects like timm and x_transformers. And also you can find that the procedure of training is intuitive thanks to the legibility of pytorch-lightning.

The model in this repository encodes input image to context vector with 'shifted-window` which is a swin-transformer encoding mechanism. And it decodes the vector with a normal auto-regressive transformer.

If you are not familiar with transformer OCR structure, transformer-ocr would be easier to understand because it uses a traditional convolution network (ResNet-v2) for the encoder.

Performance

With private korean handwritten text dataset, the accuracy(exact match) is 97.6%.

Data

./dataset/
├─ preprocessed_image/
│  ├─ cropped_image_0.jpg
│  ├─ cropped_image_1.jpg
│  ├─ ...
├─ train.txt
└─ val.txt

# in train.txt
cropped_image_0.jpg\tHello World.
cropped_image_1.jpg\tvision-transformer-ocr
...

You should preprocess the data first. Crop the image by word or sentence level area. Put all image data in a specific directory. Ground truth information should be provided with a txt file. In the txt file, write the image file name and label with \t separator in the same line.

Configuration

In settings/ directory, you can find default.yaml. You can set almost every hyper-parameter in that file. Copy one and edit it as your experiment version. I recommend you to run with the default setting first, before you change it.

Train

python run.py --version 0 --setting settings/default.yaml --num_workers 16 --batch_size 128

You can check your training log with tensorboard.

tensorboard --log_dir tb_logs --bind_all

Predict

When your model finishes training, you can use your model for prediction.

python predict.py --setting <your_setting.yaml> --target <image_or_directory> --tokenizer <your_tokenizer_pkl> --checkpoint <saved_checkpoint>

Exporting to ONNX

You can export your model to ONNX format. It's very easy thanks to pytorch-lightning. See the related pytorch-lightning document.

Citations

@misc{liu-2021,
    title   = {Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
	author  = {Ze Liu and Yutong Lin and Yue Cao and Han Hu and Yixuan Wei and Zheng Zhang and Stephen Lin and Baining Guo},
	year    = {2021},
    eprint  = {2103.14030},
	archivePrefix = {arXiv}
}

Simple and understandable swin-transformer OCR project

Related tags

Overview

swin-transformer-ocr

Overview

Performance

Data

Configuration

Train

Predict

Exporting to ONNX

Citations

Owner

Ha YongWook

The code for paper "Learning Implicit Fields for Generative Shape Modeling".

[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

This application is the basic of automated online-class-joiner(for YıldızEdu) within the right time. Gets the ZOOM link by scheduled date and time.

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

🇰🇷 Text to Image in Korean

A study project using the AA-RMVSNet to reconstruct buildings from multiple images

SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation, CVPR 2022

UNet model with VGG11 encoder pre-trained on Kaggle Carvana dataset

Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming

Deep learning toolbox based on PyTorch for hyperspectral data classification.

Hypercomplex Neural Networks with PyTorch

Simple embedding based text classifier inspired by fastText, implemented in tensorflow

Keras Model Implementation Walkthrough

SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

Tweesent-back - Tweesent backend uses fastAPI as the web framework

Reinforcement Learning for finance

Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

State of the Art Neural Networks for Generative Deep Learning

VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

PyTorch-Multi-Style-Transfer - Neural Style and MSG-Net