document image degradation

Overview

ocrodeg

The ocrodeg package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OCR applications.

The following illustrates the kinds of degradations available from ocrodeg.

%pylab inline
Populating the interactive namespace from numpy and matplotlib
rc("image", cmap="gray", interpolation="bicubic")
figsize(10, 10)
import scipy.ndimage as ndi
import ocrodeg

image = imread("testdata/W1P0.png")
imshow(image)
<matplotlib.image.AxesImage at 0x7fabcc7ab390>

png

PAGE ROTATION

This is just for illustration; for large page rotations, you can just use ndimage.

for i, angle in enumerate([0, 90, 180, 270]):
    subplot(2, 2, i+1)
    imshow(ndi.rotate(image, angle))

png

RANDOM GEOMETRIC TRANSFORMATIONS

random_transform generates random transformation parameters that work reasonably well for document image degradation. You can override the ranges used by each of these parameters by keyword arguments.

ocrodeg.random_transform()
{'angle': -0.016783842893063807,
 'aniso': 0.805280370671964,
 'scale': 0.9709145529604223,
 'translation': (0.014319657859164045, 0.03676897986267606)}

Here are four samples generated by random transforms.

for i in xrange(4):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, **ocrodeg.random_transform()))

png

You can use transform_image directly with the different parameters to get a feel for the ranges and effects of these parameters.

for i, angle in enumerate([-2, -1, 0, 1]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, angle=angle*pi/180))

png

for i, angle in enumerate([-2, -1, 0, 1]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, angle=angle*pi/180)[1000:1500, 750:1250])

png

for i, aniso in enumerate([0.5, 1.0, 1.5, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, aniso=aniso))

png

for i, aniso in enumerate([0.5, 1.0, 1.5, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, aniso=aniso)[1000:1500, 750:1250])

png

for i, scale in enumerate([0.5, 0.9, 1.0, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, scale=scale))

png

for i, scale in enumerate([0.5, 0.9, 1.0, 2.0]):
    subplot(2, 2, i+1)
    h, w = image.shape
    imshow(ocrodeg.transform_image(image, scale=scale)[h//2-200:h//2+200, w//3-200:w//3+200])

png

RANDOM DISTORTIONS

Pages often also have a small degree of warping. This can be modeled by random distortions. Very small and noisy random distortions also model ink spread, while large 1D random distortions model paper curl.

for i, sigma in enumerate([1.0, 2.0, 5.0, 20.0]):
    subplot(2, 2, i+1)
    noise = ocrodeg.bounded_gaussian_noise(image.shape, sigma, 5.0)
    distorted = ocrodeg.distort_with_noise(image, noise)
    h, w = image.shape
    imshow(distorted[h//2-200:h//2+200, w//3-200:w//3+200])

png

RULED SURFACE DISTORTIONS

for i, mag in enumerate([5.0, 20.0, 100.0, 200.0]):
    subplot(2, 2, i+1)
    noise = ocrodeg.noise_distort1d(image.shape, magnitude=mag)
    distorted = ocrodeg.distort_with_noise(image, noise)
    h, w = image.shape
    imshow(distorted[:1500])

png

BLUR, THRESHOLDING, NOISE

There are a range of utilities for modeling imaging artifacts: blurring, noise, inkspread.

patch = image[1900:2156, 1000:1256]
imshow(patch)
<matplotlib.image.AxesImage at 0x7fabc88c7e10>

png

for i, s in enumerate([0, 1, 2, 4]):
    subplot(2, 2, i+1)
    blurred = ndi.gaussian_filter(patch, s)
    imshow(blurred)

png

for i, s in enumerate([0, 1, 2, 4]):
    subplot(2, 2, i+1)
    blurred = ndi.gaussian_filter(patch, s)
    thresholded = 1.0*(blurred>0.5)
    imshow(thresholded)

png

reload(ocrodeg)
for i, s in enumerate([0.0, 1.0, 2.0, 4.0]):
    subplot(2, 2, i+1)
    blurred = ocrodeg.binary_blur(patch, s)
    imshow(blurred)

png

for i, s in enumerate([0.0, 0.1, 0.2, 0.3]):
    subplot(2, 2, i+1)
    blurred = ocrodeg.binary_blur(patch, 2.0, noise=s)
    imshow(blurred)

png

MULTISCALE NOISE

reload(ocrodeg)
for i in range(4):
    noisy = ocrodeg.make_multiscale_noise_uniform((512, 512))
    subplot(2, 2, i+1); imshow(noisy, vmin=0, vmax=1)

png

RANDOM BLOBS

for i, s in enumerate([2, 5, 10, 20]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.random_blobs(patch.shape, 3e-4, s))

png

reload(ocrodeg)
blotched = ocrodeg.random_blotches(patch, 3e-4, 1e-4)
#blotched = minimum(maximum(patch, ocrodeg.random_blobs(patch.shape, 30, 10)), 1-ocrodeg.random_blobs(patch.shape, 15, 8))
subplot(121); imshow(patch); subplot(122); imshow(blotched)
<matplotlib.image.AxesImage at 0x7fabc8a35490>

png

FIBROUS NOISE

imshow(ocrodeg.make_fibrous_image((256, 256), 700, 300, 0.01))
<matplotlib.image.AxesImage at 0x7fabc8852450>

png

FOREGROUND / BACKGROUND SELECTION

subplot(121); imshow(patch); subplot(122); imshow(ocrodeg.printlike_multiscale(patch))
<matplotlib.image.AxesImage at 0x7fabc8676d90>

png

subplot(121); imshow(patch); subplot(122); imshow(ocrodeg.printlike_fibrous(patch))
<matplotlib.image.AxesImage at 0x7fabc8d1b250>

png

Owner
NVIDIA Research Projects
NVIDIA Research Projects
An interactive document scanner built in Python using OpenCV

The scanner takes a poorly scanned image, finds the corners of the document, applies the perspective transformation to get a top-down view of the document, sharpens the image, and applies an adaptive

Kushal Shingote 1 Feb 12, 2022
Basic functions manipulating images using the OpenCV library

OpenCV Basic functions manipulating images using the OpenCV library. Reading Ima

Shatha Siala 3 Feb 17, 2022
A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports

A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports "with"-syntax.

Patrice Matz 0 Oct 30, 2021
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

hocr-tools About About the code Installation System-wide with pip System-wide from source virtualenv Available Programs hocr-check -- check the hOCR f

OCRopus 285 Dec 08, 2022
Um RPG de texto orientado a objetos.

RPG de texto Um RPG de texto orientado a objetos, sem história. Um RPG (Role-playing game) baseado em texto em que você pode viajar para alguns locais

Vinicius 3 Oct 05, 2022
Read-only mirror of https://gitlab.gnome.org/GNOME/ocrfeeder

================================= OCRFeeder - A Complete OCR Suite ================================= OCRFeeder is a complete Optical Character Recogn

GNOME Github Mirror 81 Dec 23, 2022
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

zhenjieWang 18 Jan 27, 2022
A curated list of papers, code and resources pertaining to image composition

A curated list of resources including papers, datasets, and relevant links pertaining to image composition.

BCMI 391 Dec 30, 2022
document image degradation

ocrodeg The ocrodeg package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OC

NVIDIA Research Projects 134 Nov 18, 2022
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

Table of Contents Overview Requirements Demo Modules Overview This python package contains modules to help with finding and extracting tabular data fr

Eric Ihli 311 Dec 24, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

174 Dec 31, 2022
keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》;欢迎试用,关注,并反馈问题...

keras-ctpn [TOC] 说明 预测 训练 例子 4.1 ICDAR2015 4.1.1 带侧边细化 4.1.2 不带带侧边细化 4.1.3 做数据增广-水平翻转 4.2 ICDAR2017 4.3 其它数据集 toDoList 总结 说明 本工程是keras实现的CPTN: Detecti

mick.yi 107 Jan 09, 2023
The code for “Oriented RepPoints for Aerail Object Detection”

Oriented RepPoints for Aerial Object Detection The code for the implementation of “Oriented RepPoints”, Under review. (arXiv preprint) Introduction Or

WentongLi 207 Dec 24, 2022
A simple document layout analysis using Python-OpenCV

Run the application: python main.py *Note: For first time running the application, create a folder named "output". The application is a simple documen

Roinand Aguila 109 Dec 12, 2022
A simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dropbox account at every 5 seconds

Security Camera using Opencv & Dropbox This is a simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dro

Arpit Rath 1 Jan 31, 2022
TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

FOTS: Fast Oriented Text Spotting with a Unified Network I am still working on this repo. updates and detailed instructions are coming soon! Table of

Masao Taketani 52 Nov 11, 2022
Recognizing the text contents from a scanned visiting card

Recognizing the text contents from a scanned visiting card. The application which is used to recognize the text from scanned images,printeddocuments,r

Faizan Habib 1 Jan 28, 2022
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 04, 2023