A Number Recognition algorithm

Last update: Nov 12, 2021

Related tags

Overview

Paddle-VisualAttention

Results_Compared

Methods	Steps	GPU	Batch Size	Learning Rate	Patience	Decay Step	Decay Rate	Training Speed (FPS)	Accuracy
PaddlePaddle_SVHNClassifier	54000	GTX 1080 Ti	1024	0.01	100	625	0.9	~1700	95.65%
Pytorch_SVHNClassifier	54000	GTX 1080 Ti	512	0.16	100	625	0.9	~1700	95.65%

Introduction

The main idea of this exercise is to study the evolvement of the state of the art and main work along topic of visual attention model. There are two datasets that are studied: augmented MNIST and SVHN. The former dataset focused on canonical problem — handwritten digits recognition, but with cluttering and translation, the latter focus on real world problem — street view house number (SVHN) transcription. In this exercise, the following papers are studied in the way of developing a good intuition to choose a proper model to tackle each of the above challenges.

For more detail, please refer to this blog

Recommended environment

Python 3.6+
paddlepaddle-gpu 2.0.2
nccl 2.0+
editdistance
visdom
h5py
protobuf
lmdb

Install

Install env

Install paddle following the official tutorial.

pip install visdom
pip install h5py
pip install protobuf
pip install lmdb

Dataset

Download SVHN Dataset format 1

Extract to data folder, now your folder structure should be like below:

SVHNClassifier
    - data
        - extra
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - test
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - train
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat

Usage

(Optional) Take a glance at original images with bounding boxes
```
Open `draw_bbox.ipynb` in Jupyter
```

Convert to LMDB format

$ python convert_to_lmdb.py --data_dir ./data

(Optional) Test for reading LMDBs

Open `read_lmdb_sample.ipynb` in Jupyter

Train

$ python train.py --data_dir ./data --logdir ./logs

Retrain if you need

$ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pth

Evaluate

$ python eval.py --data_dir ./data ./logs/model-100.pth

Visualize

$ python -m visdom.server
$ python visualize.py --logdir ./logs

Infer

$ python infer.py --checkpoint=./logs/model-100.pth ./images/test1.png

Clean

$ rm -rf ./logs
or
$ rm -rf ./logs_retrain

A Number Recognition algorithm

Related tags

Overview

Paddle-VisualAttention

Results_Compared

Introduction

Recommended environment

Install

Install env

Dataset

Usage

Owner

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

Few-Shot-Intent-Detection includes popular challenging intent detection datasets with/without OOS queries and state-of-the-art baselines and results.

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Implementation of momentum^2 teacher

3D position tracking for soccer players with multi-camera videos

Source code for Task-Aware Variational Adversarial Active Learning

Source code of our work: "Benchmarking Deep Models for Salient Object Detection"

[ ICCV 2021 Oral ] Our method can estimate camera poses and neural radiance fields jointly when the cameras are initialized at random poses in complex scenarios (outside-in scenes, even with less texture or intense noise )

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

JAX-based neural network library

NeurIPS workshop paper 'Counter-Strike Deathmatch with Large-Scale Behavioural Cloning'

Source code release of the paper: Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation.

Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting

DeRF: Decomposed Radiance Fields

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

An improvement of FasterGICP: Acceptance-rejection Sampling based 3D Lidar Odometry