第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)第一名;仅采用densenet识别图中文字

Overview

OCR

第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)冠军

模型结果

该比赛计算每一个条目的f1score,取所有条目的平均,具体计算方式在这里。这里的计算方式不对一句话里的相同文字重复计算,故f1score比提交的最终结果低:

- train val
f1score 0.9911 0.9582
recall 0.9943 0.9574
precision 0.9894 0.9637

模型说明

  1. 模型

采用densenet结构,模型输入为(64×512)的图片,输出为(8×64×2159)的概率。

将图片划分为多个(8×8)的方格,在每个方格预测2159个字符的概率。

  1. Loss

将(8×64×2159)的概率沿着长宽方向取最大值,得到(2159)的概率,表示这张图片里有对应字符的概率。

balance: 对正例和负例分别计算loss,使得正例loss权重之和与负例loss权重之和相等,解决数据不平衡的问题。

hard-mining

  1. 文字检测 将(8×64×2159)的概率沿着宽方向取最大值,得到(64×2159)的概率。 沿着长方向一个个方格预测文字,然后连起来可得到一句完整的语句。

存在问题:两个连续的文字无法重复检测

下图是一个文字识别正确的示例:的长为半径作圆

下图是一个文字识别错误的示例:为10元;经粗加工后销售,每

文件目录

ocr
|
|--code
|
|--files
|	|
|	|--train.csv
|
|--data
	|
	|--dataset
	|	|
	|	|--train
	|	|
	|	|--test
	|
	|--result
	|	|
	|	|--test_result.csv
	|
	|--images		此文件夹放置任何图片均可,我放的celebA数据集用作pretrain

运行环境

Ubuntu16.04, python2.7, CUDA9.0

安装pytorch, 推荐版本: 0.2.0_3

pip install -r requirement.txt

下载数据

这里下载初赛、复赛数据、模型,合并训练集、测试集。

预处理

如果不更换数据集,不需要执行这一步。

如果更换其他数据集,一并更换 files/train.csv

cd code/preprocessing
python map_word_to_index.py
python analysis_dataset.py  

训练

cd code/ocr
python main.py

测试

f1score在0.9以下,lr=0.001,不使用hard-mining;

f1score在0.9以上,lr=0.0001,使用hard-mining;

生成的model保存在不同的文件夹里。

cd code/ocr
python main.py --phase test --resume  ../../data/models-small/densenet/eval-16-1/best_f1score.ckpt
Owner
尹畅
Ph.D. in CSE Research interests: deep learning, active learning, medical application
尹畅
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con

10 Dec 04, 2022
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
Textboxes_plusplus implementation with Tensorflow (python)

TextBoxes++-TensorFlow TextBoxes++ re-implementation using tensorflow. This project is greatly inspired by slim project And many functions are modifie

81 Dec 07, 2022
Erosion and dialation using structure element in OpenCV python

Erosion and dialation using structure element in OpenCV python

Tamzid hasan 2 Nov 11, 2021
Geometric Augmentation for Text Image

Text Image Augmentation A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Ne

Canjie Luo 440 Jan 05, 2023
Simple SDF mesh generation in Python

Generate 3D meshes based on SDFs (signed distance functions) with a dirt simple Python API.

Michael Fogleman 1.1k Jan 08, 2023
ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

VistaOCR ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data Publications "How to Efficiently Increase Resolutio

ISI Center for Vision, Image, Speech, and Text Analytics 21 Dec 08, 2021
Generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv

basic-dataset-generator-from-image-of-numbers generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv inpu

1 Jan 01, 2022
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
fishington.io bot with OpenCV and NumPy

fishington.io-bot fishington.io bot with using OpenCV and NumPy bot can continue to fishing fully automatically how to use Open cmd in fishington.io-b

Bahadır Araz 77 Jan 02, 2023
Pixie - A full-featured 2D graphics library for Python

Pixie - A full-featured 2D graphics library for Python Pixie is a 2D graphics library similar to Cairo and Skia. pip install pixie-python Features: Ty

treeform 65 Dec 30, 2022
Demo processor to illustrate OCR-D Python API

ocrd_vandalize/ Demo processor to illustrate the OCR-D/core Python API Description :TODO: write docs :) Installation From PyPI pip3 install ocrd_vanda

Konstantin Baierer 5 May 05, 2022
2 telegram-bots: for image recognition and for text generation

💻 📱 Telegram_Bots 🔎 & 📖 2 telegram-bots: for image recognition and for text generation. About Image recognition bot: User sends a photo and bot de

Marina Polukoshko 1 Jan 27, 2022
EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

EAST_ICPR2018: EAST for ICPR MTWI 2018 Challenge II (Text detection of network images) Introduction This is a repository forked from argman/EAST for t

QichaoWu 49 Dec 24, 2022
Deskewing images with slanted content

skew_correction De-skewing images with slanted content by finding the deviation using Canny Edge Detection. To Run: In python 3.6, from deskew import

13 Aug 27, 2022
Using computer vision method to recognize and calcutate the features of the architecture.

building-feature-recognition In this repository, we accomplished building feature recognition using traditional/dl-assisted computer vision method. Th

4 Aug 11, 2022
A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database.

A facial recognition device is a device that takes an image or a video of a human face and compares it to another image faces in a database. The structure, shape and proportions of the faces are comp

Pavankumar Khot 4 Mar 19, 2022
~1000 book pages + OpenCV + python = page regions identified as paragraphs, lines, images, captions, etc.

cosc428-structor I had an open-ended Computer Vision assignment to complete, and an out-of-copyright book that I wanted to turn into an ebook. Convent

Chad Oliver 45 Dec 06, 2022
Color Picker and Color Detection tool for METR4202

METR4202 Color Detection Help This is sample code that can be used for the METR4202 project demo. There are two files provided, both running on Python

Miguel Valencia 1 Oct 23, 2021
DouZero is a reinforcement learning framework for DouDizhu - 斗地主AI

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

Kwai 3.1k Jan 05, 2023