The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Last update: Dec 26, 2022

Related tags

Text Data & NLP DocTr

Overview

Good news! Our new work exhibits state-of-the-art performances on DocUNet benchmark dataset: DocScanner: Robust Document Image Rectification with Progressive Learning

DocTr

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction
ACM MM 2021 Oral

Any questions or discussions are welcomed!

Training

For geometric unwarping, we train the GeoTr network using the Doc3d dataset.
For illumination correction, we train the IllTr network based on the DRIC dataset.

Inference

Download the pretrained models here and put them to $ROOT/model_pretrained/.
Geometric unwarping:
```
python inference.py
```
Geometric unwarping and illumination rectification:
```
python inference.py --ill_rec True
```

Evaluation

We use the same evaluation code as DocUNet benchmark dataset based on Matlab 2019a.
Please compare the scores according to your Matlab version.
Use the images available here for reproducing the quantitative performance reported in the paper and further comparison.

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{feng2021doctr,
  title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
  author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={273--281},
  year={2021}
}

@article{feng2021docscanner,
  title={DocScanner: Robust Document Image Rectification with Progressive Learning},
  author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
  journal={arXiv preprint arXiv:2110.14968},
  year={2021}
}

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Related tags

Overview

DocTr

Training

Inference

Evaluation

Citation

Owner

Hao Feng

Repositório da disciplina no semestre 2021-2

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Modeling cumulative cases of Covid-19 in the US during the Covid 19 Delta wave using Bayesian methods.

Snowball compiler and stemming algorithms

Code for "Generative adversarial networks for reconstructing natural images from brain activity".

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

AI_Assistant - This is a Python based Voice Assistant.

A full spaCy pipeline and models for scientific/biomedical documents.

My Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks using Tensorflow

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

CredData is a set of files including credentials in open source projects

基于Transformer的单模型、多尺度的VAE模型

Tool to check whether a GCP bucket is public or not.

a test times augmentation toolkit based on paddle2.0.

A list of NLP(Natural Language Processing) tutorials

Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework