CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

Last update: Dec 20, 2022

Overview

CUTIE

TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohui Zhao Paper Link

CUTIE 是用于“票据文档” 2D 关键信息提取/命名实体识别/槽位填充算法。使用CUTIE前，需先使用OCR算法对“票据文档” 中的文字执行检测和识别，而后将格式化的文本输入入CUTIE网络，具体流程可参照论文。

CUTIE can be considered as one type of 2-Dimensional Key Information Extraction, 2-D NER (Named Entity Recognition) or a 2-Dimensional 2D Slot Filling algorithm. Before training / inference with CUTIE, prepare your structured texts in your scanned document images with any type of OCR algorithm. Refer to the CUTIE paper for details about the procedure.

Results

Result evaluated on 4,484 receipt documents, including taxi receipts, meals entertainment receipts, and hotel receipts, with 9 different key information classes. (AP / softAP)

Method	#Params	Taxi	Hotel
CloudScan	-	82.0 / -	60.0 / -
BERT	110M	88.1 / -	71.7 / -
CUTIE	14M	94.0 / 97.3	74.6 / 87.0

Installation & Usage

pip install -r requirements.txt

Generate your own dictionary with main_build_dict.py / main_data_tokenizer.py
Train your model with main_train_json.py

CUTIE achieves best performance with rows/cols well configured. For more insights, refer to statistics in the file (others/TrainingStatistic.xlsx).

Others

For information about the input example, refer to issue discussion.

Apply any OCR tool that help you detecting and recognizing words in the scanned document image.
Label image OCR results with key information class as the .json file in the invoice_data folder. (thanks to @4kssoft)

CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

Related tags

Overview

CUTIE

Results

Installation & Usage

Others

Owner

Zhao,Xiaohui

Framework for the Complete Gaze Tracking Pipeline

天池2021"全球人工智能技术创新大赛"【赛道一】：医学影像报告异常检测 - 第三名解决方案

computer vision, image processing and machine learning on the web browser or node.

A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

A curated list of papers, code and resources pertaining to image composition

This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Camelot: PDF Table Extraction for Humans

Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

ocroseg - This is a deep learning model for page layout analysis / segmentation.

A small C++ implementation of LSTM networks, focused on OCR.

Markup for note taking

A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

Scene text detection and recognition based on Extremal Region(ER)

Line based ATR Engine based on OCRopy

An organized collection of tutorials and projects created for aspriring computer vision students.

color detection using python

Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicing