document image degradation

Last update: Nov 18, 2022

Related tags

Overview

ocrodeg

The ocrodeg package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OCR applications.

The following illustrates the kinds of degradations available from ocrodeg.

%pylab inline

Populating the interactive namespace from numpy and matplotlib

rc("image", cmap="gray", interpolation="bicubic")
figsize(10, 10)
import scipy.ndimage as ndi
import ocrodeg

image = imread("testdata/W1P0.png")
imshow(image)

<matplotlib.image.AxesImage at 0x7fabcc7ab390>

PAGE ROTATION

This is just for illustration; for large page rotations, you can just use ndimage.

for i, angle in enumerate([0, 90, 180, 270]):
    subplot(2, 2, i+1)
    imshow(ndi.rotate(image, angle))

RANDOM GEOMETRIC TRANSFORMATIONS

random_transform generates random transformation parameters that work reasonably well for document image degradation. You can override the ranges used by each of these parameters by keyword arguments.

ocrodeg.random_transform()

{'angle': -0.016783842893063807,
 'aniso': 0.805280370671964,
 'scale': 0.9709145529604223,
 'translation': (0.014319657859164045, 0.03676897986267606)}

Here are four samples generated by random transforms.

for i in xrange(4):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, **ocrodeg.random_transform()))

You can use transform_image directly with the different parameters to get a feel for the ranges and effects of these parameters.

for i, angle in enumerate([-2, -1, 0, 1]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, angle=angle*pi/180))

for i, angle in enumerate([-2, -1, 0, 1]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, angle=angle*pi/180)[1000:1500, 750:1250])

for i, aniso in enumerate([0.5, 1.0, 1.5, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, aniso=aniso))

for i, aniso in enumerate([0.5, 1.0, 1.5, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, aniso=aniso)[1000:1500, 750:1250])

for i, scale in enumerate([0.5, 0.9, 1.0, 2.0]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.transform_image(image, scale=scale))

for i, scale in enumerate([0.5, 0.9, 1.0, 2.0]):
    subplot(2, 2, i+1)
    h, w = image.shape
    imshow(ocrodeg.transform_image(image, scale=scale)[h//2-200:h//2+200, w//3-200:w//3+200])

RANDOM DISTORTIONS

Pages often also have a small degree of warping. This can be modeled by random distortions. Very small and noisy random distortions also model ink spread, while large 1D random distortions model paper curl.

for i, sigma in enumerate([1.0, 2.0, 5.0, 20.0]):
    subplot(2, 2, i+1)
    noise = ocrodeg.bounded_gaussian_noise(image.shape, sigma, 5.0)
    distorted = ocrodeg.distort_with_noise(image, noise)
    h, w = image.shape
    imshow(distorted[h//2-200:h//2+200, w//3-200:w//3+200])

RULED SURFACE DISTORTIONS

for i, mag in enumerate([5.0, 20.0, 100.0, 200.0]):
    subplot(2, 2, i+1)
    noise = ocrodeg.noise_distort1d(image.shape, magnitude=mag)
    distorted = ocrodeg.distort_with_noise(image, noise)
    h, w = image.shape
    imshow(distorted[:1500])

BLUR, THRESHOLDING, NOISE

There are a range of utilities for modeling imaging artifacts: blurring, noise, inkspread.

patch = image[1900:2156, 1000:1256]
imshow(patch)

<matplotlib.image.AxesImage at 0x7fabc88c7e10>

for i, s in enumerate([0, 1, 2, 4]):
    subplot(2, 2, i+1)
    blurred = ndi.gaussian_filter(patch, s)
    imshow(blurred)

for i, s in enumerate([0, 1, 2, 4]):
    subplot(2, 2, i+1)
    blurred = ndi.gaussian_filter(patch, s)
    thresholded = 1.0*(blurred>0.5)
    imshow(thresholded)

reload(ocrodeg)
for i, s in enumerate([0.0, 1.0, 2.0, 4.0]):
    subplot(2, 2, i+1)
    blurred = ocrodeg.binary_blur(patch, s)
    imshow(blurred)

for i, s in enumerate([0.0, 0.1, 0.2, 0.3]):
    subplot(2, 2, i+1)
    blurred = ocrodeg.binary_blur(patch, 2.0, noise=s)
    imshow(blurred)

MULTISCALE NOISE

reload(ocrodeg)
for i in range(4):
    noisy = ocrodeg.make_multiscale_noise_uniform((512, 512))
    subplot(2, 2, i+1); imshow(noisy, vmin=0, vmax=1)

RANDOM BLOBS

for i, s in enumerate([2, 5, 10, 20]):
    subplot(2, 2, i+1)
    imshow(ocrodeg.random_blobs(patch.shape, 3e-4, s))

reload(ocrodeg)
blotched = ocrodeg.random_blotches(patch, 3e-4, 1e-4)
#blotched = minimum(maximum(patch, ocrodeg.random_blobs(patch.shape, 30, 10)), 1-ocrodeg.random_blobs(patch.shape, 15, 8))
subplot(121); imshow(patch); subplot(122); imshow(blotched)

<matplotlib.image.AxesImage at 0x7fabc8a35490>

FIBROUS NOISE

imshow(ocrodeg.make_fibrous_image((256, 256), 700, 300, 0.01))

<matplotlib.image.AxesImage at 0x7fabc8852450>

FOREGROUND / BACKGROUND SELECTION

subplot(121); imshow(patch); subplot(122); imshow(ocrodeg.printlike_multiscale(patch))

<matplotlib.image.AxesImage at 0x7fabc8676d90>

subplot(121); imshow(patch); subplot(122); imshow(ocrodeg.printlike_fibrous(patch))

<matplotlib.image.AxesImage at 0x7fabc8d1b250>

document image degradation

Related tags

Overview

ocrodeg

PAGE ROTATION

RANDOM GEOMETRIC TRANSFORMATIONS

RANDOM DISTORTIONS

RULED SURFACE DISTORTIONS

BLUR, THRESHOLDING, NOISE

MULTISCALE NOISE

RANDOM BLOBS

FIBROUS NOISE

FOREGROUND / BACKGROUND SELECTION

Owner

NVIDIA Research Projects

【Auto】原神⭐钓鱼辅助工具 | 自动收竿、校准游标 | ✨您只需要抛出鱼竿，我们会帮你完成一切✨

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Handwritten Character Recognition using CNN

aardio的opencv库

This repository contains codes on how to handle mouse event using OpenCV

Ocular is a state-of-the-art historical OCR system.

An OCR evaluation tool

Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Official code for :rocket: Unsupervised Change Detection of Extreme Events Using ML On-Board :rocket:

It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

Distilling Knowledge via Knowledge Review, CVPR 2021

Maze generator and solver with python

Image processing is one of the most common term in computer vision

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

Text layer for bio-image annotation.

make a better chinese character recognition OCR than tesseract

Some bits of javascript to transcribe scanned pages using PageXML

Crop regions in napari manually