Detect the mathematical formula from the given picture and the same formula is extracted and converted into the latex code

Overview

Mathematical formulae extractor

The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code.

output

Requirements

Model

  • PyTorch (tested on v1.9)
  • Python 3.7+ & dependencies (requirements.txt)
    pip install -r requirements.txt
    

Dataset

In order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools:

Using the model

  1. Download/Clone this repository
  2. For now you need to install the Python dependencies specified in requirements.txt (look above)
  3. Download the weights.pth file from my Google Drive and place it in the checkpoints directory

Thanks to @katie-lim, you can use a nice user interface as a quick way to get the model prediction. Just call the GUI with python gui.py. From here you can take a screenshot and the predicted latex code is rendered using MathJax and copied to your clipboard.

If the model is unsure about the what's in the image it might output a different prediction every time you click "Retry". With the temperature parameter you can control this behavior (low temperature will produce the same result).

Alternatively you can use pix2tex.py with similar functionality as gui.py, only as command line tool. In this case you don't need to install PyQt5. Using this script you can also parse already existing images from the disk.

Note: As of right now it works best with images of smaller resolution. Don't zoom in all the way before taking a picture. Double check the result carefully. You can try to redo the prediction with an other resolution if the answer was wrong.

Update: I have trained an image classifier on randomly scaled images of the training data to predict the original size. This model will automatically resize the custom image to best resemble the training data and thus increase performance of images found in the wild. To use this preprocessing step, all you have to do is download the second weights file mentioned above. You should be able to take bigger (or smaller) images of the formula and still get a satisfying result

Project Architecture

In the project Architecture we have an encoder and decoder layers in which the mathematical formulae is detected from the given picture in the encoder layer which is a CNN layer and then the image is sent to the decoder layer which is an RNN layer the mathematical formulae is extracted from the picture and then the same is converted into the latex code in this layer

model_architecture

Training the model

  1. First we need to combine the images with their ground truth labels. I wrote a dataset class (which needs further improving) that saves the relative paths to the images with the LaTeX code they were rendered with. To generate the dataset pickle file run
python dataset/dataset.py --equations path_to_textfile --images path_to_images --tokenizer path_to_tokenizer --out dataset.pkl
  1. Edit the data entry in the config file to the newly generated .pkl file. Change other hyperparameters if you want to. See settings/default.yaml for a template.
  2. Now for the actual training run
python train.py --config path_to_config_file

Model

The model consist of a ViT [1] encoder with a ResNet backbone and a Transformer [2] decoder.

Performance

BLEU score normed edit distance
0.89 0.10

Data

We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the I2L-140k dataset.

Fonts

Latin Modern Math, GFSNeohellenicMath.otf, Asana Math, XITS Math, Cambria Math

Contribution

Contributions of any kind are welcome.

Acknowledgment

Code taken and modified from lucidrains, rwightman, im2markup, arxiv_leaks, pkra: Mathjax, harupy: snipping tool

A bot that extract text from images using the Tesseract OCR.

Text from image (OCR) @ocr_text_bot A simple bot to extract text from images. Usage What do I need? A AWS key configured locally, see here. NodeJS. I

Weverton Marques 4 Aug 06, 2021
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that slide and lock together.

Fusion-360-Add-In-PuzzleSpline Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that sli

Michiel van Wessem 1 Nov 15, 2021
MXNet OCR implementation. Including text recognition and detection.

insightocr Text Recognition Accuracy on Chinese dataset by caffe-ocr Network LSTM 4x1 Pooling Gray Test Acc SimpleNet N Y Y 99.37% SE-ResNet34 N Y Y 9

Deep Insight 99 Nov 01, 2022
Camera Intrinsic Calibration and Hand-Eye Calibration in Pybullet

This repository is mainly for camera intrinsic calibration and hand-eye calibration. Synthetic experiments are conducted in PyBullet simulator. 1. Tes

CAI Junhao 7 Oct 03, 2022
An application of high resolution GANs to dewarp images of perturbed documents

Docuwarp This project is focused on dewarping document images through the usage of pix2pixHD, a GAN that is useful for general image to image translat

Thomas Huang 97 Dec 25, 2022
Natural language detection

Detect the language of text. What’s so cool about franc? franc can support more languages(†) than any other library franc is packaged with support for

Titus 3.8k Jan 02, 2023
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 06, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 04, 2023
A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

NOTES: To ensure best results, make sure you are running this on a computer that has decent specs. 1920x1080 fullscreen is required in League, game mu

francis 125 Dec 30, 2022
Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels"

Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels" Please refer to htt

Ke Sun 1 Feb 14, 2022
A list of hyperspectral image super-solution resources collected by Junjun Jiang

A list of hyperspectral image super-resolution resources collected by Junjun Jiang. If you find that important resources are not included, please feel free to contact me.

Junjun Jiang 301 Jan 05, 2023
Document blur detection based on Laplacian operator and text detection.

Document Blur Detection For general blurred image, using the variance of Laplacian operator is a good solution. But as for the blur detection of docum

JoeyLr 5 Oct 20, 2022
Automatically download multiple papers by keywords in CVPR

CVFPaperHelper Automatically download multiple papers by keywords in CVPR Install mkdir PapersToRead cd PaperToRead pip install requests tqdm git clon

46 Jun 08, 2022
Automatically fishes for you while you are afk :)

Dank-memer-afk-script A simple and quick way to make easy money in Dank Memer! How to use Open a discord channel which has the Dank Memer bot enabled.

Pranav Doshi 9 Nov 11, 2022
A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

Home-Security-Demo A facial recognition program that plays a alarm (mp3 file) when a person is seen in the room. A basic theif using Python and OpenCV

SysKey 4 Nov 02, 2021
Character Segmentation using TensorFlow

Character Segmentation Segment characters and spaces in one text line,from this paper Chinese English mixed Character Segmentation as Semantic Segment

26 Aug 25, 2022
📷 Face Recognition using Haar-Cascade Classifier, OpenCV, and Python

Face-Recognition-System Face Recognition using Haar-Cascade Classifier, OpenCV and Python. This project is based on face detection and face recognitio

1 Jan 10, 2022
The Open Source Framework for Machine Vision

SimpleCV Quick Links: About Installation [Docker] (#docker) Ubuntu Virtual Environment Arch Linux Fedora MacOS Windows Raspberry Pi SimpleCV Shell Vid

Sight Machine 2.6k Dec 31, 2022
PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.

News Python3 implementations of PSENet [1], PAN [2] and PAN++ [3] are released at https://github.com/whai362/pan_pp.pytorch. [1] W. Wang, E. Xie, X. L

1.1k Dec 24, 2022