Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

Last update: Jan 07, 2023

Overview

DocEnTR

Description

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on top of the vit-pytorch vision transformers library. The proposed model can be used to enhance (binarize) degraded document images, as shown in the following samples.

Degraded Images	Our Binarization

Download Code

clone the repository:

git clone https://github.com/dali92002/DocEnTR
cd DocEnTr

Requirements

install requirements.txt

Process Data

Data Path

We gathered the DIBCO, H-DIBCO and PALM datasets and organized them in one folder. You can download it from this link. After downloading, extract the folder named DIBCOSETS and place it in your desired data path. Means: /YOUR_DATA_PATH/DIBCOSETS/

Data Splitting

Specify the data path, split size, validation and testing sets to prepare your data. In this example, we set the split size as (256 X 256), the validation set as 2016 and the testing as 2018 while running the process_dibco.py file.

python process_dibco.py --data_path /YOUR_DATA_PATH/ --split_size 256 --testing_dataset 2018 --validation_dataset 2016

Using DocEnTr

Training

For training, specify the desired settings (batch_size, patch_size, model_size, split_size and training epochs) when running the file train.py. For example, for a base model with a patch_size of (16 X 16) and a batch_size of 32 we use the following command:

python train.py --data_path /YOUR_DATA_PATH/ --batch_size 32 --vit_model_size base --vit_patch_size 16 --epochs 151 --split_size 256 --validation_dataset 2016

You will get visualization results from the validation dataset on each epoch in a folder named vis+"YOUR_EXPERIMENT_SETTINGS" (it will be created). In the previous case it will be named visbase_256_16. Also, the best weights will be saved in the folder named "weights".

Testing on a DIBCO dataset

To test the trained model on a specific DIBCO dataset (should be matched with the one specified in Section Process Data, if not, run process_dibco.py again). Download the model weights (In section Model Zoo), or use your own trained model weights. Then, run the following command. Here, I test on H-DIBCO 2018, using the Base model with 8X8 patch_size, and a batch_size of 16. The binarized images will be in the folder ./vis+"YOUR_CONFIGS_HERE"/epoch_testing/

python test.py --data_path /YOUR_DATA_PATH/ --model_weights_path  /THE_MODEL_WEIGHTS_PATH/  --batch_size 16 --vit_model_size base --vit_patch_size 8 --split_size 256 --testing_dataset 2018

Demo

To be added ... (Using our Pretrained Models To Binarize A Single Degraded Image)

Model Zoo

In this section we release the pre-trained weights for all the best DocEnTr model variants trained on DIBCO benchmarks.

	Testing data	Models	Patch size	URL	PSNR
0	DIBCO 2011	DocEnTr-Base	8x8	model	20.81
0	DIBCO 2011	DocEnTr-Large	16x16	model	20.62
1	H-DIBCO 2012	DocEnTr-Base	8x8	model	22.29
1	H-DIBCO 2012	DocEnTr-Large	16x16	model	22.04
2	DIBCO 2017	DocEnTr-Base	8x8	model	19.11
2	DIBCO 2017	DocEnTr-Large	16x16	model	18.85
3	H-DIBCO 2018	DocEnTr-Base	8x8	model	19.46
3	H-DIBCO 2018	DocEnTr-Large	16x16	model	19.47

Citation

If you find this useful for your research, please cite it as follows:

@article{souibgui2022docentr,
  title={DocEnTr: An end-to-end document image enhancement transformer},
  author={ Souibgui, Mohamed Ali and Biswas, Sanket and  Jemni, Sana Khamekhem and Kessentini, Yousri and Forn{\'e}s, Alicia and Llad{\'o}s, Josep and Pal, Umapada},
  journal={arXiv preprint arXiv:2201.10252},
  year={2022}
}

Authors

Conclusion

There should be no bugs in this code, but if there is, we are sorry for that :') !!

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

Related tags

Overview

DocEnTR

Description

Download Code

Requirements

Process Data

Data Path

Data Splitting

Using DocEnTr

Training

Testing on a DIBCO dataset

Demo

Model Zoo

Citation

Authors

Conclusion

Owner

Mohamed Ali Souibgui

Codes for "Template-free Prompt Tuning for Few-shot NER".

[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"

This is the accompanying toolbox for the paper "A Survey on GANs for Anomaly Detection"

An Artificial Intelligence trying to drive a car by itself on a user created map

Official PyTorch implementation of SyntaSpeech (IJCAI 2022)

Time should be taken seer-iously

Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

Stock-history-display - something like a easy yearly review for your stock performance

Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in 3D.

QuALITY: Question Answering with Long Input Texts, Yes!

This is a GUI interface which can process forest fire detection, smoke detection and fire segmentation

Companion repository to the paper accepted at the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities

Repository for RNNs using TensorFlow and Keras - LSTM and GRU Implementation from Scratch - Simple Classification and Regression Problem using RNNs

A Python module for parallel optimization of expensive black-box functions

Code repository for the work "Multi-Domain Incremental Learning for Semantic Segmentation", accepted at WACV 2022

Nerf pl - NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning

AWS documentation corpus for zero-shot open-book question answering.

Generic Foreground Segmentation in Images

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队

This is a file about Unet implemented in Pytorch

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

Related tags

Overview

DocEnTR

Description

Download Code

Requirements

Process Data

Data Path

Data Splitting

Using DocEnTr

Training

Testing on a DIBCO dataset

Demo

Model Zoo

Citation

Authors

Conclusion

Owner

Mohamed Ali Souibgui

Codes for "Template-free Prompt Tuning for Few-shot NER".

[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"

This is the accompanying toolbox for the paper "A Survey on GANs for Anomaly Detection"

An Artificial Intelligence trying to drive a car by itself on a user created map

Official PyTorch implementation of SyntaSpeech (IJCAI 2022)

Time should be taken seer-iously

Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

Stock-history-display - something like a easy yearly review for your stock performance

Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in 3D.

QuALITY: Question Answering with Long Input Texts, Yes!

This is a GUI interface which can process forest fire detection, smoke detection and fire segmentation

Companion repository to the paper accepted at the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities

Repository for RNNs using TensorFlow and Keras - LSTM and GRU Implementation from Scratch - Simple Classification and Regression Problem using RNNs

A Python module for parallel optimization of expensive black-box functions

Code repository for the work "Multi-Domain Incremental Learning for Semantic Segmentation", accepted at WACV 2022

Nerf pl - NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning

AWS documentation corpus for zero-shot open-book question answering.

Generic Foreground Segmentation in Images

2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

This is a file about Unet implemented in Pytorch

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队