Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

Overview

Scene Text Recognition Recommendations


Everythin about Scene Text Recognition

SOTA Papers Datasets Code

Contents

1.Papers

All Papers Can be Find Here

  • Latest Papers:
up to (2021-12-8)
up to (2021-12-3)
up to (2021-11-25)

2.Datasets

2.1 Synthetic Datasets

Dataset Description Examples BaiduNetdisk link
SynthText 9 million synthetic text instance images from a set of 90k common English words. Words are rendered onto nartural images with random transformations SynthText Scene text datasets(提取码:emco)
MJSynth 6 million synthetic text instances. It's a generation of SynthText. MJText Scene text datasets(提取码:emco)

2.2 Benchmarks

Dataset Description Examples BaiduNetdisk link
IIIT5k-Words(IIIT5K) 3000 test images instances. Take from street scenes and from originally-digital images IIIT5K Scene text datasets(提取码:emco)
Street View Text(SVT) 647 test images instances. Some images are severely corrupted by noise, blur, and low resolution SVT Scene text datasets(提取码:emco)
StreetViewText-Perspective(SVT-P) 639 test images instances. It is specifically designed to evaluate perspective distorted textrecognition. It is built based on the original SVT dataset by selecting the images at the sameaddress on Google Street View but with different view angles. Therefore, most text instancesare heavily distorted by the non-frontal view angle. SVTP Scene text datasets(提取码:emco)
ICDAR 2003(IC03) 867 test image instances IC03 Scene text datasets(提取码:mfir)
ICDAR 2013(IC13) 1015 test images instances IC13 Scene text datasets(提取码:emco)
ICDAR 2015(IC15) 2077 test images instances. As text images were taken by Google Glasses without ensuringthe image quality, most of the text is very small, blurred, and multi-oriented IC15 Scene text datasets(提取码:emco)
CUTE80(CUTE) 288 It focuses on curved text recognition. Most images in CUTE have acomplex background, perspective distortion, and poor resolution CUTE Scene text datasets(提取码:emco)

3.1 Public Code

3.1. Frameworks

PaddleOCR (百度)

  • PaddlePaddle/PaddleOCR
  • 特性 (截取至PaddleOCR):
    • 使用百度自研深度学习框架PaddlePaddle搭建
    • PP-OCR系列高质量预训练模型,准确的识别效果
      • 超轻量PP-OCRv2系列:检测(3.1M)+ 方向分类器(1.4M)+ 识别(8.5M)= 13.0M
      • 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
      • 通用PPOCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
      • 支持中英文数字组合识别、竖排文本识别、长文本识别
      • 支持多语言识别:韩语、日语、德语、法语
      • 丰富易用的OCR相关工具组件
    • 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
      • 数据合成工具Style-Text:批量合成大量与目标场景类似的图像
      • 文档分析能力PP-Structure:版面分析与表格识别
      • 支持用户自定义训练,提供丰富的预测推理部署方案
      • 支持PIP快速安装使用
      • 可运行于Linux、Windows、MacOS等多种系统
  • 支持算法(识别):
    • CRNN
    • Rosetta
    • STAR-Net
    • RARE
    • SRN
    • NRTR

MMOCR (商汤)

  • open-mmlab/mmocr
  • 特性(截取至MMOCR):
    • MMOCR 是基于 PyTorchmmdetection 的开源工具箱,专注于文本检测,文本识别以及相应的下游任务,如关键信息提取。 它是 OpenMMLab 项目的一部分。
    • 该工具箱不仅支持文本检测和文本识别,还支持其下游任务,例如关键信息提取。
  • 支持算法(识别)
    • CRNN (TPAMI'2016)
    • NRTR (ICDAR'2019)
    • RobustScanner (ECCV'2020)
    • SAR (AAAI'2019)
    • SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)
    • SegOCR (Manuscript'2021)

Deep Text Recognition Benchmark (ClovaAI)


3.2. Algorithms

CRNN


ASTER

  • Tensorflow, official, 651 : bgshih/aster
    • 官方实现版本,使用Tensorflow
  • Pytorch, 535 :ayumuymk/aster.pytorch
    • Pytorch版本,准确率相较原文有明显提升

MORANv2

  • Pytorch, official, 572 :Canjie-Luo/MORAN_v2
    • MORAN v2版本。更加稳定的单阶段训练,更换ResNet做backbone,使用双向解码器

4.SOTA

Regular Dataset Irregular  dataset
Model Year IIIT SVT IC13(857) IC13(1015) IC15(1811) IC15(2077) SVTP CUTE
CRNN  2015 78.2 80.8 - 86.7 - - - -
ASTER(L2R)  2015 92.67 91.16 - 90.74 76.1 - 78.76 76.39
CombBest  2019 87.9 87.5 93.6 92.3 77.6 71.8 79.2 74
ESIR 2019 93.3 90.2 - 91.3 - 76.9 79.6 83.3
SE-ASTER  2020 93.8 89.6 - 92.8 80 81.4 83.6
DAN  2020 94.3 89.2 - 93.9 - 74.5 80 84.4
RobustScanner 2020 95.3 88.1 - 94.8 - 77.1 79.5 90.3
AutoSTR  2020 94.7 90.9 - 94.2 81.8 - 81.7 -
Yang et al.  2020 94.7 88.9 - 93.2 79.5 77.1 80.9 85.4
SATRN  2020 92.8 91.3 - 94.1 - 79 86.5 87.8
SRN  2020 94.8 91.5 95.5 - 82.7 - 85.1 87.8
GA-SPIN  2021 95.2 90.9 - 94.8 82.8 79.5 83.2 87.5
PREN2D  2021 95.6 94 96.4 - 83 - 87.6 91.7
Bhunia et al.  2021 95.2 92.2 - 95.5 - 84 85.7 89.7
VisionLAN  2021 95.8 91.7 95.7 - 83.7 - 86 88.5
ABINet  2021 96.2 93.5 97.4 - 86.0 - 89.3 89.2
MATRN 2021 96.7 94.9 97.9 95.8 86.6 82.9 90.5 94.1

Baek's Reimplementation Version

img

Owner
Deep Learning and Vision Computing Lab, SCUT
Deep Learning and Vision Computing Lab, SCUT
Autonomous Driving project for Euro Truck Simulator 2

hope-autonomous-driving Autonomous Driving project for Euro Truck Simulator 2 Video: How is it working ? In this video, the program processes the imag

Umut Görkem Kocabaş 36 Nov 06, 2022
A curated list of resources dedicated to scene text localization and recognition

Scene Text Localization & Recognition Resources A curated list of resources dedicated to scene text localization and recognition. Any suggestions and

CarlosTao 1.6k Dec 22, 2022
Comparison-of-OCR (KerasOCR, PyTesseract,EasyOCR)

Optical Character Recognition OCR (Optical Character Recognition) is a technology that enables the conversion of document types such as scanned paper

21 Dec 25, 2022
Recognizing cropped text in natural images.

ASTER: Attentional Scene Text Recognizer with Flexible Rectification ASTER is an accurate scene text recognizer with flexible rectification mechanism.

Baoguang Shi 681 Jan 02, 2023
Official code for :rocket: Unsupervised Change Detection of Extreme Events Using ML On-Board :rocket:

RaVAEn The RaVÆn system We introduce the RaVÆn system, a lightweight, unsupervised approach for change detection in satellite data based on Variationa

SpaceML 35 Jan 05, 2023
Ddddocr - 通用验证码识别OCR pypi版

带带弟弟OCR通用验证码识别SDK免费开源版 今天ddddocr又更新啦! 当前版本为1.3.1 想必很多做验证码的新手,一定头疼碰到点选类型的图像,做样本费时

Sml2h3 4.4k Dec 31, 2022
Play the Namibian game of Owela against a terrible AI. Built using Django and htmx.

Owela Club A Django project for playing the Namibian game of Owela against a dumb AI. Built following the rules described on the Mancala World wiki pa

Adam Johnson 18 Jun 01, 2022
Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture

Handwriting Recognition System This repository is the Tensorflow implementation of the Handwriting Recognition System described in Handwriting Recogni

Edgard Chammas 346 Jan 07, 2023
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

188 Dec 28, 2022
The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

SpeechDrivesTemplates The official repo for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates". [arxiv

Qian Shenhan 53 Dec 23, 2022
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

90 Dec 22, 2022
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

Zipeng Xu 34 Nov 28, 2022
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

758 Dec 22, 2022
Hand Detection and Finger Detection on Live Feed

Hand-Detection-On-Live-Feed Hand Detection and Finger Detection on Live Feed Getting Started Install the dependencies $ git clone https://github.com/c

Chauhan Mahaveer 2 Jan 02, 2022
A version of nrsc5-gui that merges the interface developed by cmnybo with the architecture developed by zefie in order to start a new baseline that is not heavily dependent upon Python processing.

NRSC5-DUI is a graphical interface for nrsc5. It makes it easy to play your favorite FM HD radio stations using an RTL-SDR dongle. It will also displa

61 Dec 22, 2022
Fun program to overlay a mask to yourself using a webcam

Superhero Mask Overlay Description Simple project made for fun. It consists of placing a mask (a PNG image with transparent background) on your face.

KB Kwan 10 Dec 01, 2022
Connect Aseprite to Blender for painting pixelart textures in real time

Pribambase Pribambase is a small tool that connects Aseprite and Blender, to allow painting with instant viewport feedback and all functionality of ex

117 Jan 03, 2023
A curated list of promising OCR resources

Call for contributor(paper summary,dataset generation,algorithm implementation and any other useful resources) awesome-ocr A curated list of promising

wanghaisheng 1.6k Jan 04, 2023
Using python libraries to track hands

Python-HandTracking Using python libraries to track hands on a camera Uses cv2 and mediapipe libraries custom hand tracking module PyCharm IDE Final E

Martin Matsudaira 1 Dec 17, 2021
QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 119 Dec 02, 2022