A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Last update: Jan 03, 2023

Overview

SVHNClassifier-PyTorch

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

If you're interested in C++ inference, move HERE

Results

Steps	GPU	Batch Size	Learning Rate	Patience	Decay Step	Decay Rate	Training Speed (FPS)	Accuracy
54000	GTX 1080 Ti	512	0.16	100	625	0.9	~1700	95.65%

Sample

$ python infer.py -c=./logs/model-54000.pth ./images/test-75.png
length: 2
digits: 7 5 10 10 10

$ python infer.py -c=./logs/model-54000.pth ./images/test-190.png
length: 3
digits: 1 9 0 10 10

Loss

Requirements

Python 3.6
torch 1.0
torchvision 0.2.1
visdom
```
$ pip install visdom
```

h5py

In Ubuntu:
$ sudo apt-get install libhdf5-dev
$ sudo pip install h5py

protobuf
```
$ pip install protobuf
```
lmdb
```
$ pip install lmdb
```

Setup

Clone the source code

$ git clone https://github.com/potterhsu/SVHNClassifier-PyTorch
$ cd SVHNClassifier-PyTorch

Download SVHN Dataset format 1

Extract to data folder, now your folder structure should be like below:

SVHNClassifier
    - data
        - extra
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - test
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - train
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat

Usage

(Optional) Take a glance at original images with bounding boxes
```
Open `draw_bbox.ipynb` in Jupyter
```

Convert to LMDB format

$ python convert_to_lmdb.py --data_dir ./data

(Optional) Test for reading LMDBs

Open `read_lmdb_sample.ipynb` in Jupyter

Train

$ python train.py --data_dir ./data --logdir ./logs

Retrain if you need

$ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pth

Evaluate

$ python eval.py --data_dir ./data ./logs/model-100.pth

Visualize

$ python -m visdom.server
$ python visualize.py --logdir ./logs

Infer

$ python infer.py --checkpoint=./logs/model-100.pth ./images/test1.png

Clean

$ rm -rf ./logs
or
$ rm -rf ./logs_retrain

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Related tags

Overview

SVHNClassifier-PyTorch

Results

Sample

Loss

Requirements

Setup

Usage

Owner

Potter Hsu

Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.

👐OpenHands : Making Sign Language Recognition Accessible (WiP 🚧👷‍♂️🏗)

The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

PyTorch code for the ICCV'21 paper: "Always Be Dreaming: A New Approach for Class-Incremental Learning"

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

PCGNN - Procedural Content Generation with NEAT and Novelty

Modified fork of Xuebin Qin's U-2-Net Repository. Used for demonstration purposes.

Using multidimensional LSTM neural networks to create a forecast for Bitcoin price

OpenMMLab Detection Toolbox and Benchmark

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Sample code from the Neural Networks from Scratch book.

A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners

PyTorch implementations of algorithms for density estimation

Revisting Open World Object Detection

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Match SafeGraph POIs with Data collected through a cultural resource survey in Washington DC.

This repository contains the code for Direct Molecular Conformation Generation (DMCG).

Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction