keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》；欢迎试用，关注，并反馈问题...

Last update: Jan 09, 2023

Overview

keras-ctpn

[TOC]

说明
预测
训练
例子
4.1 ICDAR2015
4.1.1 带侧边细化
4.1.2 不带带侧边细化
4.1.3 做数据增广-水平翻转
4.2 ICDAR2017
4.3 其它数据集
toDoList
总结

说明

本工程是keras实现的CPTN: Detecting Text in Natural Image with Connectionist Text Proposal Network . 本工程实现主要参考了keras-faster-rcnn ; 并在ICDAR2015和ICDAR2017数据集上训练和测试。

工程地址: keras-ctpn

cptn论文翻译:CTPN.md

效果：

使用ICDAR2015的1000张图像训练在500张测试集上结果为：Recall: 37.07 % Precision: 42.94 % Hmean: 39.79 %; 原文中的F值为61%；使用了额外的3000张图像训练。

关键点说明:

a.骨干网络使用的是resnet50

b.训练输入图像大小为720*720; 将图像的长边缩放到720,保持长宽比,短边padding;原文是短边600;预测时使用1024*1024

c.batch_size为4, 每张图像训练128个anchor,正负样本比为1:1;

d.分类、边框回归以及侧边细化的损失函数权重为1:1:1;原论文中是1:1:2

e.侧边细化与边框回归选择一样的正样本anchor;原文中应该是分开选择的

f.侧边细化还是有效果的(注:网上很多人说没有啥效果)

g.由于有双向GRU，水平翻转会影响效果(见样例做数据增广-水平翻转)

h.随机裁剪做数据增广，网络不收敛

预测

a. 工程下载

git clone https://github.com/yizt/keras-ctpn

b. 预训练模型下载

ICDAR2015训练集上训练好的模型下载地址： google drive，百度云盘取码:wm47

c.修改配置类config.py中如下属性

	WEIGHT_PATH = '/tmp/ctpn.h5'

d. 检测文本

python predict.py --image_path image_3.jpg

评估

a. 执行如下命令,并将输出的txt压缩为zip包

python evaluate.py --weight_path /tmp/ctpn.100.h5 --image_dir /opt/dataset/OCR/ICDAR_2015/test_images/ --output_dir /tmp/output_2015/

b. 提交在线评估将压缩的zip包提交评估，评估地址:http://rrc.cvc.uab.es/?ch=4&com=mymethods&task=1

训练

a. 训练数据下载

#icdar2013
wget http://rrc.cvc.uab.es/downloads/Challenge2_Training_Task12_Images.zip
wget http://rrc.cvc.uab.es/downloads/Challenge2_Training_Task1_GT.zip
wget http://rrc.cvc.uab.es/downloads/Challenge2_Test_Task12_Images.zip

#icdar2015
wget http://rrc.cvc.uab.es/downloads/ch4_training_images.zip
wget http://rrc.cvc.uab.es/downloads/ch4_training_localization_transcription_gt.zip
wget http://rrc.cvc.uab.es/downloads/ch4_test_images.zip

#icdar2017
wget -c -t 0 http://datasets.cvc.uab.es/rrc/ch8_training_images_1~8.zip
wget -c -t 0 http://datasets.cvc.uab.es/rrc/ch8_training_localization_transcription_gt_v2.zip
wget -c -t 0 http://datasets.cvc.uab.es/rrc/ch8_test_images.zip

b. resnet50与训练模型下载

wget https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5

c. 修改配置类config.py中，如下属性

	# 预训练模型
    PRE_TRAINED_WEIGHT = '/opt/pretrained_model/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'

    # 数据集路径
    IMAGE_DIR = '/opt/dataset/OCR/ICDAR_2015/train_images'
    IMAGE_GT_DIR = '/opt/dataset/OCR/ICDAR_2015/train_gt'

d.训练

python train.py --epochs 50

例子

ICDAR2015

带侧边细化

不带侧边细化

做数据增广-水平翻转

ICDAR2017

其它数据集

toDoList

侧边细化(已完成)
ICDAR2017数据集训练(已完成)
检测文本行坐标映射到原图(已完成)
精度评估(已完成)
侧边回归,限制在边框内(已完成)
增加水平翻转(已完成)
增加随机裁剪(已完成)

总结

ctpn对水平文字检测效果不错
整个网络对于数据集很敏感;在2017上训练的模型到2015上测试效果很不好；同样2015训练的在2013上测试效果也很差
推测由于双向GRU，网络有存储记忆的缘故？在使用随机裁剪作数据增广时网络不收敛，使用水平翻转时预测结果也水平对称出现

keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》；欢迎试用，关注，并反馈问题...

Related tags

Overview

keras-ctpn

说明

预测

评估

训练

例子

ICDAR2015

带侧边细化

不带侧边细化

做数据增广-水平翻转

ICDAR2017

其它数据集

toDoList

总结

Owner

mick.yi

Perspective recovery of text using transformed ellipses

An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

MXNet OCR implementation. Including text recognition and detection.

A simple python program to record security cam footage by detecting a face and body of a person in the frame.

ocroseg - This is a deep learning model for page layout analysis / segmentation.

Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation, CVPR 2020 (Oral)

TedEval: A Fair Evaluation Metric for Scene Text Detectors

An Optical Character Recognition system using Pytesseract/Extracting data from Blood Pressure Reports.

Optical character recognition for Japanese text, with the main focus being Japanese manga

A python script based on opencv and paddleocr, which can automatically pick up tasks, make cookies, and receive rewards in the Destiny 2 Dawning Oven

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

かの有名なあの東方二次創作ソング、「bad apple!」のMVをPythonでやってみたって話

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

基于openpose和图像分类的手语识别项目

Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.

In this project we will be using the live feed coming from the webcam to create a virtual mouse with complete functionalities.

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

A tool to make dumpy among us GIFS

Pure Javascript OCR for more than 100 Languages 📖🎉🖥