STEFANN: Scene Text Editor using Font Adaptive Neural Network

Overview

Getting Started  •   Training Networks  •   External Links  •   Citation  •   License



The official GitHub repository for the paper on STEFANN: Scene Text Editor using Font Adaptive Neural Network.


Getting Started

1. Installing Dependencies

Package Source Version Tested version
(Updated on April 14, 2020)
Python Conda 3.7.7 ✔️
Pip Conda 20.0.2 ✔️
Numpy Conda 1.18.1 ✔️
Requests Conda 2.23.0 ✔️
TensorFlow Conda 2.1.0 ✔️
Keras Conda 2.3.1 ✔️
Pillow Conda 7.0.0 ✔️
Colorama Conda 0.4.3 ✔️
OpenCV PyPI 4.2.0 ✔️
PyQt5 PyPI 5.14.2 ✔️

💥 Quick installation

Step 1: Install Git and Conda package manager (Miniconda / Anaconda)

Step 2: Update and configure Conda

conda update conda
conda config --set env_prompt "({name}) "

Step 3: Clone this repository and change directory to repository root

git clone https://github.com/prasunroy/stefann.git
cd stefann

Step 4: Create an environment and install depenpencies

On Linux and Windows

  • To create CPU environment: conda env create -f release/env_cpu.yml
  • To create GPU environment: conda env create -f release/env_gpu.yml

On macOS

  • To create CPU environment: conda env create -f release/env_osx.yml

💥 Quick test

Step 1: Download models and pretrained checkpoints into release/models directory

Step 2: Download sample images and extract into release/sample_images directory

stefann/
├── ...
├── release/
│   ├── models/
│   │   ├── colornet.json
│   │   ├── colornet_weights.h5
│   │   ├── fannet.json
│   │   └── fannet_weights.h5
│   ├── sample_images/
│   │   ├── 01.jpg
│   │   ├── 02.jpg
│   │   └── ...
│   └── ...
└── ...

Step 3: Activate environment

To activate CPU environment: conda activate stefann-cpu
To activate GPU environment: conda activate stefann-gpu

Step 4: Change directory to release and run STEFANN

cd release
python stefann.py

2. Editing Results 😆


Each image pair consists of the original image (Left) and the edited image (Right).


Training Networks

1. Downloading Datasets

Download datasets and extract the archives into datasets directory under repository root.

stefann/
├── ...
├── datasets/
│   ├── fannet/
│   │   ├── pairs/
│   │   ├── train/
│   │   └── valid/
│   └── colornet/
│       ├── test/
│       ├── train/
│       └── valid/
└── ...

📌 Description of datasets/fannet

This dataset is used to train FANnet and it consists of 3 directories: fannet/pairs, fannet/train and fannet/valid. The directories fannet/train and fannet/valid consist of 1015 and 300 sub-directories respectively, each corresponding to one specific font. Each font directory contains 64x64 grayscale images of 62 English alphanumeric characters (10 numerals + 26 upper-case letters + 26 lower-case letters). The filename format is xx.jpg where xx is the ASCII value of the corresponding character (e.g. "48.jpg" implies an image of character "0"). The directory fannet/pairs contains 50 image pairs, each corresponding to a random font from fannet/valid. Each image pair is horizontally concatenated to a dimension of 128x64. The filename format is id_xx_yy.jpg where id is the image identifier, xx and yy are the ASCII values of source and target characters respectively (e.g. "00_65_66.jpg" implies a transformation from source character "A" to target character "B" for the image with identifier "00").

📌 Description of datasets/colornet

This dataset is used to train Colornet and it consists of 3 directories: colornet/test, colornet/train and colornet/valid. Each directory consists of 5 sub-directories: _color_filters, _mask_pairs, input_color, input_mask and output_color. The directory _color_filters contains synthetically generated color filters of dimension 64x64 including both solid and gradient colors. The directory _mask_pairs contains a set of 64x64 grayscale image pairs selected at random from 1315 available fonts in datasets/fannet. Each image pair is horizontally concatenated to a dimension of 128x64. For colornet/train and colornet/valid each color filter is applied on each mask pair. This results in 64x64 image triplets of color source image, binary target image and color target image in input_color, input_mask and output_color directories respectively. For colornet/test one color filter is applied only on one mask pair to generate similar image triplets. With a fixed set of 100 mask pairs, 80000 colornet/train and 20000 colornet/valid samples are generated from 800 and 200 color filters respectively. With another set of 50 mask pairs, 50 colornet/test samples are generated from 50 color filters.

2. Training FANnet and Colornet

Step 1: Activate environment

To activate CPU environment: conda activate stefann-cpu
To activate GPU environment: conda activate stefann-gpu

Step 2: Change directory to project root

cd stefann

Step 3: Configure and train FANnet

To configure training options edit configurations section (line 40-72) of fannet.py
To start training: python fannet.py

☁️ Check this notebook hosted at Kaggle for an interactive demonstration of FANnet.

Step 4: Configure and train Colornet

To configure training options edit configurations section (line 38-65) of colornet.py
To start training: python colornet.py

☁️ Check this notebook hosted at Kaggle for an interactive demonstration of Colornet.

External Links

Project  •   Paper  •   Supplementary Materials  •   Datasets  •   Models  •   Sample Images


Citation

@InProceedings{Roy_2020_CVPR,
  title     = {STEFANN: Scene Text Editor using Font Adaptive Neural Network},
  author    = {Roy, Prasun and Bhattacharya, Saumik and Ghosh, Subhankar and Pal, Umapada},
  booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2020}
}

License

Copyright 2020 by the authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Made with ❤️ and 🍕 on Earth.
The virtual calculator will be above the live streaming from your camera

The virtual calculator is above the live streaming from my camera usb , the program first detect my hand and in each frame calculate the distance between two finger ,if the distance is lower than the

gasbaoui mohammed al amine 5 Jul 01, 2022
A curated list of papers, code and resources pertaining to image composition

A curated list of resources including papers, datasets, and relevant links pertaining to image composition.

BCMI 391 Dec 30, 2022
Automatically fishes for you while you are afk :)

Dank-memer-afk-script A simple and quick way to make easy money in Dank Memer! How to use Open a discord channel which has the Dank Memer bot enabled.

Pranav Doshi 9 Nov 11, 2022
第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)第一名;仅采用densenet识别图中文字

OCR 第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)冠军 模型结果 该比赛计算每一个条目的f1score,取所有条目的平均,具体计算方式在这里。这里的计算方式不对一句话里的相同文字重复计算,故f1score比提交的最终结果低: - train val f1score 0

尹畅 441 Dec 22, 2022
Sort By Face

Sort-By-Face This is an application with which you can either sort all the pictures by faces from a corpus of photos or retrieve all your photos from

0 Nov 29, 2021
FOTS Pytorch Implementation

News!!! Recognition branch now is added into model. The whole project has beed optimized and refactored. ICDAR Dataset SynthText 800K Dataset detectio

Ning Lu 599 Dec 19, 2022
Give a solution to recognize MaoYan font.

猫眼字体识别 该 github repo 在于帮助xjtlu的同学们识别猫眼的扭曲字体。已经打包上传至 pypi ,可以使用 pip 直接安装。 猫眼字体的识别不出来的原理与解决思路在采茶上 使用方法: import MaoYanFontRecognize

Aruix 4 Jun 30, 2022
The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

SpeechDrivesTemplates The official repo for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates". [arxiv

Qian Shenhan 53 Dec 23, 2022
Crop regions in napari manually

napari-crop Crop regions in napari manually Usage Create a new shapes layer to annotate the region you would like to crop: Use the rectangle tool to a

Robert Haase 4 Sep 29, 2022
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

Chandru 2 Feb 20, 2022
Table recognition inside douments using neural networks

TableTrainNet A simple project for training and testing table recognition in documents. This project was developed to make a neural network which reco

Giovanni Cavallin 93 Jul 24, 2022
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
Machine Leaning applied to denoise images to improve OCR Accuracy

Machine Learning to Denoise Images for Better OCR Accuracy This project is an adaptation of this tutorial and used only for learning purposes: https:/

Antonio Bri Pérez 2 Nov 16, 2022
[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别

本文基于tensorflow、keras/pytorch实现对自然场景的文字检测及端到端的OCR中文文字识别 update20190706 为解决本项目中对数学公式预测的准确性,做了其他的改进和尝试,效果还不错,https://github.com/xiaofengShi/Image2Katex 希

xiaofeng 2.7k Dec 25, 2022
QED-C: The Quantum Economic Development Consortium provides these computer programs and software for use in the fields of quantum science and engineering.

Application-Oriented Performance Benchmarks for Quantum Computing This repository contains a collection of prototypical application- or algorithm-cent

SRI International 67 Nov 30, 2022
✌️Using this you can control your PC/Laptop volume by Hand Gestures created with Python.

Hand Gesture Volume Controller ✋ Hand recognition 👆 Finger recognition 🔊 you can decrease and increase volume Demo Code Firstly I have created a Mod

Abbas Ataei 19 Nov 17, 2022
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 03, 2022
([email protected]) Boosting Co-teaching with Compression Regularization for Label Noise

Nested-Co-teaching ([email protected]) Pytorch implementation of paper "Boosting Co-tea

YINGYI CHEN 41 Jan 03, 2023