An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

Overview

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in "Rethinking floating point for deep learning" [1].

There are two types of floating point implemented:

  • N-bit (N, l, alpha, beta, gamma) log with ELMA [1]
  • N-bit (N, s) (linear) posit [2]

with partial implementation of IEEE-style (e, s) floating point (likely quite buggy) and non-posit tapered log.

8-bit (8, 1, 5, 5, 7) log is the format described in "Rethinking floating point for deep learning", shown within to be more energy efficient than int8/32 integer multiply-add at 28 nm and an effective drop-in replacement for IEEE 754 binary32 single precision floating point via round to nearest even for CNN inference on ResNet-50 on ImageNet.

[1] Johnson, J. "Rethinking floating point for deep learning." (2018). https://arxiv.org/abs/1811.01721

[2] Gustafson, J. and Yonemoto, I. "Beating floating point at its own game: Posit arithmetic." Supercomputing Frontiers and Innovations 4.2 (2017): 71-86.

Requirements

You will need:

  • a PyTorch CPU installation
  • a C++11-compatible compiler to use to generate a PyTorch C++ extension module
  • the ImageNet ILSVRC12 image validation set
  • an Intel OpenCL for FPGA compatible board
  • a Quartus Prime Pro installation with the Intel OpenCL for FPGA compiler

rtl contains the SystemVerilog modules needed for the design.

bitstream contains the OpenCL that wraps the RTL modules.

cpp contains host CPU-side code for interacting with the FPGA OpenCL design.

py contains the top-level functionality to compile the CPU code and run networks.

Flow

In bitstream, run

./build_lib.sh <design>

followed by

./build_afu.sh <design> (this will take several hours to synthesize the FPGA design)

where <design> is one of loglib or positlib. The aoc/aocl tools, Quartus, Quartus license file, OpenCL BSP etc. must be in your path/environment. loglib is configured to generate a design with 8-bit (8, 1, 5, 5, 7) log arithmetic, and positlib is configured to generate a design with 8-bit (8, 1) posit arithmetic by default.

The aoc build seems to require a Python 2.x interpreter in the path, otherwise it will fail.

Update the aocx_file in py/run_fpga_resnet.py to your choice of design.

Update valdir towards the end of py/validate.py to point to a Torch dataset loader compatible installation of the ImageNet validation set.

Using a python environment with PyTorch available, in py run:

python run_fpga_resnet.py

If successful, this will run the complete validation set against the FPGA design. This requires a Python 3.x interpreter.

RTL comments

The modules used by the OpenCL design reside in rtl/log/operators and rtl/posit/operators. You can see how they are assembled here.

rtl/paper_syn contains the modules used in the paper's 28 nm synthesis results (Paper*Top.sv are the top-level modules). Waves_*.sv are the testbench programs used to generate switching activity for power analysis output.

You will have to provide your own Synopsys Design Compiler scripts/flow/cell libraries/PDK/etc. for synthesis, as we are not allowed to share details on which 28 nm semiconductor process was used or our Design Compiler synthesis scripts.

Other comments

The posit encoding implemented herein implements negative values with a sign bit rather than two's complement encoding. It is a TODO to change it, but the cost either way is largely dwarfed by other concerns in my opinion.

The FPGA design itself is not super flexible yet to support different bit widths than 8. loglib is restricted to N <= 8 bits at the moment, while positlib should be ok for N <= 16 bits, though some of the larger designs may run into FPGA resource issues if synthesized for the FPGA.

Contributions

This repo currently exists as a proof of concept. Contributions may be considered, but the design is mostly that which is needed to reproduce the results from the paper.

License

This code is licensed under CC-BY-NC 4.0.

This code also includes and uses the Single Python Fixed-Point Module for LUT SystemVerilog log-to-linear and linear-to-log mapping module generation in rtl/log/luts, which is licensed by the Python-2.4.2 license.

Owner
Facebook Research
Facebook Research
CoRe: Contrastive Recurrent State-Space Models

CoRe: Contrastive Recurrent State-Space Models This code implements the CoRe model and reproduces experimental results found in Robust Robotic Control

Apple 21 Aug 11, 2022
Companion repo of the UCC 2021 paper "Predictive Auto-scaling with OpenStack Monasca"

Predictive Auto-scaling with OpenStack Monasca Giacomo Lanciano*, Filippo Galli, Tommaso Cucinotta, Davide Bacciu, Andrea Passarella 2021 IEEE/ACM 14t

Giacomo Lanciano 0 Dec 07, 2022
PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning PyTorch code for our ACL 2020 paper "MART: Memory-Augmented Recur

Jie Lei 雷杰 151 Jan 06, 2023
Code for Discriminative Sounding Objects Localization (NeurIPS 2020)

Discriminative Sounding Objects Localization Code for our NeurIPS 2020 paper Discriminative Sounding Objects Localization via Self-supervised Audiovis

51 Dec 11, 2022
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Evaluating the Factual Consistency of Abstractive Text Summarization Authors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher Int

Salesforce 165 Dec 21, 2022
Docker containers of baseline agents for the Crafter environment

Crafter Baselines This repository contains Docker containers for running various baselines on the Crafter environment. Reward Agents DreamerV2 based o

Danijar Hafner 17 Sep 25, 2022
Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter This is a pytorch-based implementation for paper Implicit Feature Alignme

wangtianwei 61 Nov 12, 2022
Official Pytorch implementation of RePOSE (ICCV2021)

RePOSE: Iterative Rendering and Refinement for 6D Object Detection (ICCV2021) [Link] Abstract We present RePOSE, a fast iterative refinement method fo

Shun Iwase 68 Nov 15, 2022
Clustering with variational Bayes and population Monte Carlo

pypmc pypmc is a python package focusing on adaptive importance sampling. It can be used for integration and sampling from a user-defined target densi

45 Feb 06, 2022
Kaggle: Cell Instance Segmentation

Kaggle: Cell Instance Segmentation The goal of this challenge is to detect cells in microscope images. with simple view on how many cels have been ann

Jirka Borovec 9 Aug 12, 2022
DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing Figure: Joint multi-attribute edits using DyStyle model. Great diversity

74 Dec 03, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

Aiden Nibali 36 Oct 30, 2022
The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

SSL models are Strong UDA learners Introduction This is the official code of paper "Semi-supervised Models are Strong Unsupervised Domain Adaptation L

Yabin Zhang 26 Dec 26, 2022
Perspective: Julia for Biologists

Perspective: Julia for Biologists 1. Examples Speed: Example 1 - Single cell data and network inference Domain: Single cell data Methodology: Network

Elisabeth Roesch 55 Dec 02, 2022
[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

CPT: Efficient Deep Neural Network Training via Cyclic Precision Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin Accep

26 Oct 25, 2022
Fast (simple) spectral synthesis and emission-line fitting of DESI spectra.

FastSpecFit Introduction This repository contains code and documentation to perform fast, simple spectral synthesis and emission-line fitting of DESI

5 Aug 02, 2022
[ICCV 2021] Focal Frequency Loss for Image Reconstruction and Synthesis

Focal Frequency Loss - Official PyTorch Implementation This repository provides the official PyTorch implementation for the following paper: Focal Fre

Liming Jiang 460 Jan 04, 2023
Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Joint Discriminative and Generative Learning for Person Re-identification [Project] [Paper] [YouTube] [Bilibili] [Poster] [Supp] Joint Discriminative

NVIDIA Research Projects 1.2k Dec 30, 2022
Taming Transformers for High-Resolution Image Synthesis

Taming Transformers for High-Resolution Image Synthesis CVPR 2021 (Oral) Taming Transformers for High-Resolution Image Synthesis Patrick Esser*, Robin

CompVis Heidelberg 3.5k Jan 03, 2023
PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

This is a PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR), using subpixel convolution to optimize the inference speed of TecoGAN VSR model. Please refer to the offi

789 Jan 04, 2023