Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

Overview

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

This repository contains the source code for the paper (link will be posted).

Requirements

  • GPU
  • Python 3
  • PyTorch 1.9
    • Earlier version may work, but untested.
  • pip install -r requirements.txt
  • If running ResNet-21 or ImageNet experiments, first download and prepare the ImageNet 2012 dataset with bin/imagenet_prep.sh script.

Running

For non-ImageNet experiments, the main python file is main.py. To see its arguments:

python main.py --help

Running for the first time can take a little longer due to automatic downloading of the MNIST and Cifar-10 dataset from the Internet.

For ImageNet experiments, the main python files are main_imagenet_float.py and main_imagenet_binary.py. Too see their arguments:

python main_imagenet_float.py --help

and

python main_imagenet_binary.py --help

The ImageNet dataset must be already downloaded and prepared. Please see the requirements section for details.

Scripts

The main python file has many options. The following scripts runs training with hyper-parameters given in the paper. Output includes a run-log text file and tensorboard files. These files are saved to ./logs and reused for subsequent runs.

300-100-10

Sensitivity Pre-training

# Layer 1. Learning rate 0.1.
./scripts/mnist/300/sensitivity/layer.sh sensitivity forward 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/mnist/300/sensitivity/layer.sh sensitivity 231 0.1 0
# Layer 3. Learning rate 0.1.
./scripts/mnist/300/sensitivity/layer.sh sensitivity reverse 0.1 0

Output files and run-log are written to ./logs/mnist/val/sensitivity/.

Hyperparam search

For floating-point training:

# Learning rate 0.1.
./scripts/mnist/300/val/float.sh hyperparam 0.1 0

For full binary training:

# Learning rate 0.1.
./scripts/mnist/300/val/binary.sh hyperparam 0.1 0

For iterative training:

# Forward order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam forward 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam reverse 0.1 0
# 1, 3, 2 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 132 0.1 0
# 2, 1, 3 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 213 0.1 0
# 2, 3, 1 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 231 0.1 0
# 3, 1, 2 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 312 0.1 0

Output files and run-log are written to ./logs/mnist/val/hyperparam/.

Full Training

For floating-point training:

# Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/float.sh full 0.1 316 0

For full binary training:

# Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/binary.sh full 0.1 316 0

For iterative training:

# Forward order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full forward 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full reverse 0.1 316 0
# 1, 3, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 132 0.1 316 0
# 2, 1, 3 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 213 0.1 316 0
# 2, 3, 1 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 231 0.1 316 0
# 3, 1, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 312 0.1 316 0

Output files and run-log are written to ./logs/mnist/run/full/.

784-100-10

Sensitivity Pre-training

# Layer 1. Learning rate 0.1.
./scripts/mnist/784/sensitivity/layer.sh sensitivity forward 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/mnist/784/sensitivity/layer.sh sensitivity 231 0.1 0
# Layer 3. Learning rate 0.1.
./scripts/mnist/784/sensitivity/layer.sh sensitivity reverse 0.1 0

Output files and run-log are written to ./logs/mnist/val/sensitivity/.

Hyperparam search

For floating-point training:

# Learning rate 0.1.
./scripts/mnist/784/val/float.sh hyperparam 0.1 0

For full binary training:

# Learning rate 0.1.
./scripts/mnist/784/val/binary.sh hyperparam 0.1 0

For iterative training:

# Forward order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam forward 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam reverse 0.1 0
# 1, 3, 2 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 132 0.1 0
# 2, 1, 3 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 213 0.1 0
# 2, 3, 1 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 231 0.1 0
# 3, 1, 2 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 312 0.1 0

Output files and run-log are written to ./logs/mnist/val/hyperparam/.

Full Training

For floating-point training:

# Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/float.sh full 0.1 316 0

For full binary training:

# Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/binary.sh full 0.1 316 0

For iterative training:

# Forward order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full forward 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full reverse 0.1 316 0
# 1, 3, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 132 0.1 316 0
# 2, 1, 3 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 213 0.1 316 0
# 2, 3, 1 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 231 0.1 316 0
# 3, 1, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 312 0.1 316 0

Output files and run-log are written to ./logs/mnist/run/full/.

Vgg-5

Sensitivity Pre-training

# Layer 1. Learning rate 0.1.
./scripts/cifar10/vgg5/sensitivity/layer.sh sensitivity 1 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/cifar10/vgg5/sensitivity/layer.sh sensitivity 2 0.1 0
# Layer 5. Learning rate 0.1.
./scripts/cifar10/vgg5/sensitivity/layer.sh sensitivity 5 0.1 0

Output files and run-log are written to ./logs/cifar10/val/sensitivity/.

Hyperparam Search

For floating-point training:

# Learning rate 0.1.
./scripts/cifar10/vgg5/val/float.sh hyperparam 0.1 0

For full binary training:

# Learning rate 0.1.
./scripts/cifar10/vgg5/val/binary.sh hyperparam 0.1 0

For iterative training:

# Forward order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam forward 0.1 0
# Ascend order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam ascend 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam reverse 0.1 0
# Descend order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam descend 0.1 0
# Random order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam random 0.1 0

Output files and run-log are written to ./logs/cifar10/val/hyperparam/.

Full Training

For floating-point training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/float.sh full 0.1 316 0

For full binary training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/binary.sh full 0.1 316 0

For iterative training:

# Forward order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full forward 0.1 316 0
# Ascend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full ascend 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full reverse 0.1 316 0
# Descend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full descend 0.1 316 0
# Random order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full random 0.1 316 0

Output files and run-log are written to ./logs/cifar10/run/full/.

Vgg-9

Sensitivity Pre-training

# Layer 1. Learning rate 0.1.
./scripts/cifar10/vgg9/sensitivity/layer.sh sensitivity 1 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/cifar10/vgg9/sensitivity/layer.sh sensitivity 2 0.1 0
# Layer 5. Learning rate 0.1.
./scripts/cifar10/vgg9/sensitivity/layer.sh sensitivity 5 0.1 0

Output files and run-log are written to ./logs/cifar10/val/sensitivity/.

Hyperparam Search

For floating-point training:

# Learning rate 0.1.
./scripts/cifar10/vgg9/val/float.sh hyperparam 0.1 0

For full binary training:

# Learning rate 0.1.
./scripts/cifar10/vgg9/val/binary.sh hyperparam 0.1 0

For iterative training:

# Forward order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam forward 0.1 0
# Ascend order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam ascend 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam reverse 0.1 0
# Descend order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam descend 0.1 0
# Random order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam random 0.1 0

Output files and run-log are written to ./logs/cifar10/val/hyperparam/.

Full Training

For floating-point training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/float.sh full 0.1 316 0

For full binary training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/binary.sh full 0.1 316 0

For iterative training:

# Forward order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full forward 0.1 316 0
# Ascend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full ascend 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full reverse 0.1 316 0
# Descend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full descend 0.1 316 0
# Random order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full random 0.1 316 0

Output files and run-log are written to ./logs/cifar10/run/full/.

ResNet-20

Sensitivity Pre-training

# Layer 1. Learning rate 0.1.
./scripts/cifar10/resnet20/sensitivity/layer.sh sensitivity 1 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/cifar10/resnet20/sensitivity/layer.sh sensitivity 2 0.1 0
# ...
# Layer 20. Learning rate 0.1.
./scripts/cifar10/resnet20/sensitivity/layer.sh sensitivity 20 0.1 0

Output files and run-log are written to ./logs/cifar10/val/sensitivity/.

Hyperparam Search

For floating-point training:

# Learning rate 0.1
./scripts/cifar10/resnet20/val/float.sh hyperparam 0.1 0

For full binary training:

# Learning rate 0.1
./scripts/cifar10/resnet20/val/binary.sh hyperparam 0.1 0

For iterative training:

# Forward order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam forward 0.1 0
# Ascend order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam ascend 0.1 0
# Reverse order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam reverse 0.1 0
# Descend order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam descend 0.1 0
# Random order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam random 0.1 0

Output files and run-log are written to ./logs/cifar10/val/hyperparam/.

Full Training

For floating-point training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/float.sh full 0.1 316 0

For full binary training:

# Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/binary.sh full 0.1 316 0

For iterative training:

# Forward order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full forward 0.1 316 0
# Ascend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full ascend 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full reverse 0.1 316 0
# Descend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full descend 0.1 316 0
# Random order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full random 0.1 316 0

Output files and run-log are written to ./logs/cifar10/run/full/.

ResNet-21

To run experiments for ResNet-21, first download and prepare the ImageNet dataset. See the requirements section at the beginning of this readme. We assume the dataset is prepared and is at ./imagenet.

Sensitivity Pre-training

# Layer 1. Learning rate 0.01.
./scripts/imagenet/layer.sh sensitivity ./imagenet 20 "[20]" 20 1 0.01
# Layer 2. Learning rate 0.01.
./scripts/imagenet/layer.sh sensitivity ./imagenet 20 "[20]" 20 2 0.01
# Layer 21. Learning rate 0.01.
./scripts/imagenet/layer.sh sensitivity ./imagenet 20 "[20]" 20 21 0.01

Output files and run-log are written to ./logs/imagenet/sensitivity/.

Full Training

For floating-point training:

# Learning rate 0.01.
./scripts/imagenet/float.sh full ./imagenet 67 "[42,57]" 0.01

For full binary training:

# Learning rate 0.01.
./scripts/imagenet/binary.sh full ./imagenet 67 "[42,57]" 0.01

For layer-by-layer training:

# Forward order
./scripts/imagenet/layer.sh full ./imagenet 67 "[42,57]" 2 forward 0.01
# Ascending order
./scripts/imagenet/layer.sh full ./imagenet 67 "[42,57]" 2 ascend 0.01

For all scripts, output files and run-log are written to ./logs/imagenet/full/.

License

See LICENSE

Contributing

See the contributing guide for details of how to participate in development of the module.

Owner
Rakuten Group, Inc.
Rakuten Group, Inc.
Indonesian Car License Plate Character Recognition using Tensorflow, Keras and OpenCV.

Monopol Indonesian Car License Plate (Indonesia Mobil Nomor Polisi) Character Recognition using Tensorflow, Keras and OpenCV. Background This applicat

Jayaku Briliantio 3 Apr 07, 2022
The official implementation of the Hybrid Self-Attention NEAT algorithm

PUREPLES - Pure Python Library for ES-HyperNEAT About This is a library of evolutionary algorithms with a focus on neuroevolution, implemented in pure

Adrian Westh 91 Dec 12, 2022
A new GCN model for Point Cloud Analyse

Pytorch Implementation of PointNet and PointNet++ This repo is implementation for VA-GCN in pytorch. Classification (ModelNet10/40) Data Preparation D

12 Feb 02, 2022
Official repository for CVPR21 paper "Deep Stable Learning for Out-Of-Distribution Generalization".

StableNet StableNet is a deep stable learning method for out-of-distribution generalization. This is the official repo for CVPR21 paper "Deep Stable L

120 Dec 28, 2022
[ICSE2020] MemLock: Memory Usage Guided Fuzzing

MemLock: Memory Usage Guided Fuzzing This repository provides the tool and the evaluation subjects for the paper "MemLock: Memory Usage Guided Fuzzing

Cheng Wen 54 Jan 07, 2023
Subdivision-based Mesh Convolutional Networks

Subdivision-based Mesh Convolutional Networks The official implementation of SubdivNet in our paper, Subdivion-based Mesh Convolutional Networks Requi

Zheng-Ning Liu 181 Dec 28, 2022
Zero-Cost Proxies for Lightweight NAS

Zero-Cost-NAS Companion code for the ICLR2021 paper: Zero-Cost Proxies for Lightweight NAS tl;dr A single minibatch of data is used to score neural ne

SamsungLabs 108 Dec 20, 2022
Source code for Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning Official implementation of ACC, described in the paper "Adaptively Calibrated C

3 Sep 16, 2022
Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

20 Oct 24, 2022
This is an official implementation of the High-Resolution Transformer for Dense Prediction.

High-Resolution Transformer for Dense Prediction Introduction This is the official implementation of High-Resolution Transformer (HRT). We present a H

HRNet 403 Dec 13, 2022
The Official PyTorch Implementation of DiscoBox.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision Paper | Project page | Demo (Youtube) | Demo (Bilib

NVIDIA Research Projects 89 Jan 09, 2023
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 11.3k Dec 30, 2022
This repository consists of Blender python scripts and corresponding assets to generate variants of the CANDLE dataset

candle-simulator This repository consists of Blender python scripts and corresponding assets to generate variants of the IITH-CANDLE dataset. The rend

1 Dec 15, 2021
M3DSSD: Monocular 3D Single Stage Object Detector

M3DSSD: Monocular 3D Single Stage Object Detector Setup pytorch 0.4.1 Preparation Download the full KITTI detection dataset. Then place a softlink (or

mumianyuxin 64 Dec 27, 2022
Neural network-based build time estimation for additive manufacturing

Neural network-based build time estimation for additive manufacturing Oh, Y., Sharp, M., Sprock, T., & Kwon, S. (2021). Neural network-based build tim

Yosep 1 Nov 15, 2021
DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.

DWIPrep: A Robust Preprocessing Pipeline for dMRI Data DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transp

Gal Ben-Zvi 1 Jan 09, 2023
OpenLT: An open-source project for long-tail classification

OpenLT: An open-source project for long-tail classification Supported Methods for Long-tailed Recognition: Cross-Entropy Loss Focal Loss (ICCV'17) Cla

Ming Li 37 Sep 15, 2022
MANO hand model porting for the GraspIt simulator

Learning Joint Reconstruction of Hands and Manipulated Objects - ManoGrasp Porting the MANO hand model to GraspIt! simulator Yana Hasson, Gül Varol, D

Lucas Wohlhart 10 Feb 08, 2022
DFM: A Performance Baseline for Deep Feature Matching

DFM: A Performance Baseline for Deep Feature Matching Python (Pytorch) and Matlab (MatConvNet) implementations of our paper DFM: A Performance Baselin

143 Jan 02, 2023
Graph Transformer Architecture. Source code for

Graph Transformer Architecture Source code for the paper "A Generalization of Transformer Networks to Graphs" by Vijay Prakash Dwivedi and Xavier Bres

NTU Graph Deep Learning Lab 561 Jan 08, 2023