[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Overview

F8Net
Fixed-Point 8-bit Only Multiplication for Network Quantization (ICLR 2022 Oral)

OpenReview | arXiv | PDF | Model Zoo | BibTex

PyTorch implementation of neural network quantization with fixed-point 8-bit only multiplication.

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
Qing Jin1,2, Jian Ren1, Richard Zhuang1, Sumant Hanumante1, Zhengang Li2, Zhiyu Chen3, Yanzhi Wang2, Kaiyuan Yang3, Sergey Tulyakov1
1Snap Inc., 2Northeastern University, 3Rice University
ICLR 2022 Oral.

Overview Neural network quantization implements efficient inference via reducing the weight and input precisions. Previous methods for quantization can be categorized as simulated quantization, integer-only quantization, and fixed-point quantization, with the former two involving high-precision multiplications with 32-bit floating-point or integer scaling. In contrast, fixed-point models can avoid such high-demanding requirements but demonstrates inferior performance to the other two methods. In this work, we study the problem of how to train such models. Specifically, we conduct statistical analysis on values for quantization and propose to determine the fixed-point format from data during training with some semi-empirical formula. Our method demonstrates that high-precision multiplication is not necessary for the quantized model to achieve comparable performance as their full-precision counterparts.

Getting Started

Requirements
  1. Please check the requirements and download packages.

  2. Prepare ImageNet-1k data following pytorch example, and create a softlink to the ImageNet data path to data under current the code directory (ln -s /path/to/imagenet data).

Model Training
Conventional training
  • We train the model with the file distributed_run.sh and the command
    bash distributed_run.sh /path/to/yml_file batch_size
    
  • We set batch_size=2048 for conventional training of floating-/fixed-point ResNet18 and MobileNet V1/V2.
  • Before training, please update the dataset_dir and log_dir arguments in the yaml files for training the floating-/fixed-point models.
  • To train the floating-point model, please use the yaml file ***_floating_train.yml in the conventional subfolder under the corresponding folder of the model.
  • To train the fixed-point model, please first train the floating-point model as the initialization. Please use the yaml file ***_fix_quant_train.yml in the conventional subfolder under the corresponding folder of the model. Please make sure the argument fp_pretrained_file directs to the correct path for the corresponding floating-point checkpoint. We also provide our pretrained floating-point models in the Model Zoo below.
Tiny finetuning
  • We finetune the model with the file run.sh and the command

    bash run.sh /path/to/yml_file batch_size
    
  • We set batch_size=128 and use one GPU for tiny-finetuning of fixed-point ResNet18/50.

  • Before fine-tuning, please update the dataset_dir and log_dir arguments in the yaml files for finetuning the fixed-point models.

  • To finetune the fixed-point model, please use the yaml file ***_fix_quant_***_pretrained_train.yml in the tiny_finetuning subfolder under the corresponding folder of the model. For model pretrained with PytorchCV (Baseline of ResNet18 and Baseline#1 of ResNet50), the floating-point checkpoint will be downloaded automatically during code running. For the model pretrained by Nvidia (Baseline#2 of ResNet50), please download the checkpoint first and make sure the argument nvidia_pretrained_file directs to the correct path of this checkpoint.

Model Testing
  • We test the model with the file run.sh and the command

    bash run.sh /path/to/yml_file batch_size
    
  • We set batch_size=128 and use one GPU for model testing.

  • Before testing, please update the dataset_dir and log_dir arguments in the yaml files. Please update the argument integize_file_path and int_op_only_file_path arguments in the yaml files ***_fix_quant_test***_integize.yml and ***_fix_quant_test***_int_op_only.yml, respectively. Please also update other arguments like nvidia_pretrained_file if necessary (even if they are not used during testing).

  • We use the yaml file ***_floating_test.yml for testing the floating-point model; ***_fix_quant***_test.yml for testing the fixed-point model with the same setting as during training/tiny-finetuning; ***_fix_quant***_test_int_model.yml for testing the fixed-point model on GPU with all quantized weights, bias and inputs implemented with integers (but with float dtype as GPU does not support integer operations) and use the original modules during training (e.g. with batch normalization layers); ***_fix_quant***_test_integize.yml for testing the fixed-point model on GPU with all quantized weights, bias and inputs implemented with integers (but with float dtype as GPU does not support integer operations) and a new equivalent model with only convolution, pooling and fully-connected layers; ***_fix_quant***_test_int_op_only.yml for testing the fixed-point model on CPU with all quantized weights, bias and inputs implemented with integers (with int dtype) and a new equivalent model with only convolution, pooling and fully-connected layers. Note that the accuracy from the four testing files can differ a little due to numerical error.

Model Export
  • We export fixed-point model with integer weights, bias and inputs to run on GPU and CPU during model testing with ***_fix_quant_test_integize.yml and ***_fix_quant_test_int_op_only.yml files, respectively.

  • The exported onnx files are saved to the path given by the arguments integize_file_path and int_op_only_file_path.

F8Net Model Zoo

All checkpoints and onnx files are available at here.

Conventional

Model Type Top-1 Acc.a Checkpoint
ResNet18 FP
8-bit
70.3
71.0
Res18_32
Res18_8
MobileNet-V1 FP
8-bit
72.4
72.8
MBV1_32
MBV1_8
MobileNet-V2b FP
8-bit
72.7
72.6
MBV2b_32
MBV2b_8

Tiny Finetuning

Model Type Top-1 Acc.a Checkpoint
ResNet18 FP
8-bit
73.1
72.3
Res18_32p
Res18_8p
ResNet50b (BL#1) FP
8-bit
77.6
77.6
Res50b_32p
Res50b_8p
ResNet50b (BL#2) FP
8-bit
78.5
78.1
Res50b_32n
Res50b_8n

a The accuracies are obtained from the inference step during training. Test accuracy for the final exported model might have some small accuracy difference due to numerical error.

Technical Details

The main techniques for neural network quantization with 8-bit fixed-point multiplication includes the following:

  • Quantized methods/modules including determining fixed-point formats from statistics or by grid-search, fusing convolution and batch normalization layers, and reformulating PACT with fixed-point quantization are implemented in models/fix_quant_ops.
  • Clipping-level sharing and private fractional length for residual blocks are implemented in the ResNet (models/fix_resnet) and MobileNet V2 (models/fix_mobilenet_v2).

Acknowledgement

This repo is based on AdaBits.

Citation

If our code or models help your work, please cite our paper:

@inproceedings{
  jin2022fnet,
  title={F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization},
  author={Qing Jin and Jian Ren and Richard Zhuang and Sumant Hanumante and Zhengang Li and Zhiyu Chen and Yanzhi Wang and Kaiyuan Yang and Sergey Tulyakov},
  booktitle={International Conference on Learning Representations},
  year={2022},
  url={https://openreview.net/forum?id=_CfpJazzXT2}
}
Owner
Snap Research
Snap Research
Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

Second-order Convergence Properties of Random Search Methods This repository the paper "On the Second-order Convergence Properties of Random Search Me

Adamos Solomou 0 Nov 13, 2021
Official PyTorch implementation for paper "Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer"

UPT: Unary–Pairwise Transformers This repository contains the official PyTorch implementation for the paper Frederic Z. Zhang, Dylan Campbell and Step

Frederic Zhang 109 Dec 20, 2022
Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19)

Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19) Tianyu Wang*, Xin Yang*, Ke Xu, Shaozhe Chen, Qiang Zhang, Ry

Steve Wong 177 Dec 01, 2022
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information This repository contains code, model, dataset for ChineseBERT at ACL2021. Ch

413 Dec 01, 2022
Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Hiring research interns for visual transformer

Multimedia Research 484 Dec 29, 2022
Self Governing Neural Networks (SGNN): the Projection Layer

Self Governing Neural Networks (SGNN): the Projection Layer A SGNN's word projections preprocessing pipeline in scikit-learn In this notebook, we'll u

Guillaume Chevalier 22 Nov 06, 2022
Apache Flink

Apache Flink Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. Learn more about Flin

The Apache Software Foundation 20.4k Dec 30, 2022
PantheonRL is a package for training and testing multi-agent reinforcement learning environments.

PantheonRL is a package for training and testing multi-agent reinforcement learning environments. PantheonRL supports cross-play, fine-tuning, ad-hoc coordination, and more.

Stanford Intelligent and Interactive Autonomous Systems Group 57 Dec 28, 2022
HistoKT: Cross Knowledge Transfer in Computational Pathology

HistoKT: Cross Knowledge Transfer in Computational Pathology Exciting News! HistoKT has been accepted to ICASSP 2022. HistoKT: Cross Knowledge Transfe

Mahdi S. Hosseini 5 Jan 05, 2023
Code for paper: "Spinning Language Models for Propaganda-As-A-Service"

Spinning Language Models for Propaganda-As-A-Service This is the source code for the Arxiv version of the paper. You can use this Google Colab to expl

Eugene Bagdasaryan 16 Jan 03, 2023
load .txt to train YOLOX, same as Yolo others

YOLOX train your data you need generate data.txt like follow format (per line- one image). prepare one data.txt like this: img_path1 x1,y1,x2,y2,clas

LiMingf 18 Aug 18, 2022
Differentiable Wavetable Synthesis

Differentiable Wavetable Synthesis

4 Feb 11, 2022
Plugin adapted from Ultralytics to bring YOLOv5 into Napari

napari-yolov5 Plugin adapted from Ultralytics to bring YOLOv5 into Napari. Training and detection can be done using the GUI. Training dataset must be

2 May 05, 2022
3D Pose Estimation for Vehicles

3D Pose Estimation for Vehicles Introduction This work generates 4 key-points and 2 key-edges from vertices and edges of vehicles as ground truth. The

Jingyi Wang 1 Nov 01, 2021
Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Microsoft 282 Jan 09, 2023
A torch implementation of "Pixel-Level Domain Transfer"

Pixel Level Domain Transfer A torch implementation of "Pixel-Level Domain Transfer". based on dcgan.torch. Dataset The dataset used is "LookBook", fro

Fei Xia 260 Sep 02, 2022
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Tensorpack is a neural network training interface based on TensorFlow. Features: It's Yet Another TF high-level API, with speed, and flexibility built

Tensorpack 6.2k Jan 01, 2023
Code for IntraQ, PyTorch implementation of our paper under review

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper Requirements Python = 3.7.10 Pytorch == 1.7

1 Nov 19, 2021
GPU Programming with Julia - course at the Swiss National Supercomputing Centre (CSCS), ETH Zurich

Course Description The programming language Julia is being more and more adopted in High Performance Computing (HPC) due to its unique way to combine

Samuel Omlin 192 Jan 03, 2023
DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction (3DV 2021)

DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction (3DV 2021) This repo is the implementation of DPC. Tested environment Pyth

Dvir Ginzburg 30 Nov 30, 2022