Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

Overview

DewarpNet

This repository contains the codes for DewarpNet training.

Recent Updates

  • [May, 2020] Added evaluation images and an important note about Matlab SSIM.
  • [Dec, 2020] Added OCR evaluation details.

Training

  • Prepare Data: train.txt & val.txt. Contents should be like:
1/824_8-cp_Page_0503-7Ns0001
1/824_1-cp_Page_0504-2Cw0001
  • Train Shape Network: python trainwc.py --arch unetnc --data_path ./data/DewarpNet/doc3d/ --batch_size 50 --tboard
  • Train Texture Mapping Network: python trainbm.py --arch dnetccnl --img_rows 128 --img_cols 128 --img_norm --n_epoch 250 --batch_size 50 --l_rate 0.0001 --tboard --data_path ./DewarpNet/doc3d

Inference:

  • Run: python infer.py --wc_model_path ./eval/models/unetnc_doc3d.pkl --bm_model_path ./eval/models/dnetccnl_doc3d.pkl --show

Evaluation (Image Metrics):

  • We use the same evaluation code as DocUNet. To reproduce the quantitative results reported in the paper use the images available here.

  • [Important note about Matlab version] We noticed that Matlab 2020a uses a different SSIM implementation which gives a better MS-SSIM score (0.5623). Whereas we have used Matlab 2018b. Please compare the scores according to your Matlab version.

Evaluation (OCR Metrics):

  • The 25 images used for OCR evaluation is /eval/ocr_eval/ocr_files.txt
  • The corresponding ground-truth text is given in /eval/ocr_eval/tess_gt.json
  • For the OCR errors reported in the paper we had used cv2.blur as pre-processing which gives higher error in all the cases. For convenience, we provide the updated numbers (without using blur) in the following table:
Method ED CER ED (no blur) CER (no blur)
DocUNet 1975.86 0.4656(0.263) 1671.80 0.403 (0.256)
DocUNet on Doc3D 1684.34 0.3955 (0.272) 1296.00 0.294 (0.235)
DewarpNet 1288.60 0.3136 (0.248) 1007.28 0.249 (0.236)
DewarpNet (ref) 1114.40 0.2692 (0.234) 812.48 0.204 (0.228)
  • We had used the Tesseract (v4.1.0) default configuration for evaluation with PyTesseract (v0.2.6).

Models:

  • Pre-trained models are available here. These models are captured prior to end-to-end training, thus won't give you the end-to-end results reported in Table 2 of the paper. Use the images provided above to get the exact numbers as Table 2.

Dataset:

  • The doc3D dataset can be downloaded using the scripts here.

More Stuff:

Citation:

If you use the dataset or this code, please consider citing our work-

@inproceedings{SagnikKeICCV2019, 
Author = {Sagnik Das*, Ke Ma*, Zhixin Shu, Dimitris Samaras, Roy Shilkrot}, 
Booktitle = {Proceedings of International Conference on Computer Vision}, 
Title = {DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks}, 
Year = {2019}}   

Acknowledgements:

Owner
[email protected]
Computer Vision Lab at Stony Brook University
<a href=[email protected]">
A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports

A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports "with"-syntax.

Patrice Matz 0 Oct 30, 2021
OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

Alan Tang 354 Dec 12, 2022
Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

Overview This collection demonstrates how to construct and train a deep, bidirectional stacked LSTM using CNN features as input with CTC loss to perfo

Jerod Weinman 489 Dec 21, 2022
question‘s area recognition using image processing and regular expression

======================================== Paper-Question-recognition ======================================== question‘s area recognition using image p

Yuta Mizuki 7 Dec 27, 2021
chineseocr/table_line 表格线检测模型pytorch版

table_line_pytorch chineseocr/table_detct 表格线检测模型table_line pytorch版 原项目github: https://github.com/chineseocr/table-detect 1、模型转换 下载原项目table_detect模型文

1 Oct 21, 2021
Python Computer Vision Aim Bot for Roblox's Phantom Forces

Python-Phantom-Forces-Aim-Bot Python Computer Vision Aim Bot for Roblox's Phanto

drag0ngam3s 2 Jul 11, 2022
PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV)

About PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV) Colorizor Приложение для проекта Yand

1 Apr 04, 2022
A curated list of promising OCR resources

Call for contributor(paper summary,dataset generation,algorithm implementation and any other useful resources) awesome-ocr A curated list of promising

wanghaisheng 1.6k Jan 04, 2023
The first open-source library that detects the font of a text in a image.

Typefont Typefont is an experimental library that detects the font of a text in a image. Usage Import the main function and invoke it like in the foll

Vasile Pește 1.6k Feb 24, 2022
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

162 Jan 05, 2023
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 04, 2023
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
list all open dataset about ocr.

ocr-open-dataset list all open dataset about ocr. printed dataset year Born-Digital Images (Web and Email) 2011-2015 COCO-Text 2017 Text Extraction fr

hongbomin 95 Nov 24, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
A selectional auto-encoder approach for document image binarization

The code of this repository was used for the following publication. If you find this code useful please cite our paper: @article{Gallego2019, title =

Javier Gallego 89 Nov 18, 2022
QED-C: The Quantum Economic Development Consortium provides these computer programs and software for use in the fields of quantum science and engineering.

Application-Oriented Performance Benchmarks for Quantum Computing This repository contains a collection of prototypical application- or algorithm-cent

SRI International 67 Nov 30, 2022
Detect the mathematical formula from the given picture and the same formula is extracted and converted into the latex code

Mathematical formulae extractor The goal of this project is to create a learning based system that takes an image of a math formula and returns corres

6 May 22, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
Simple SDF mesh generation in Python

Generate 3D meshes based on SDFs (signed distance functions) with a dirt simple Python API.

Michael Fogleman 1.1k Jan 08, 2023