Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"

Last update: Nov 23, 2021

Related tags

Deep Learning Low_Precision_DL

Overview

Low Precision Decentralized Training with Heterogenous Data

Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"

[Paper]

Abstract

Decentralized distributed learning is the key to enabling large-scale machine learning (training) on the edge devices utilizing private user-generated local data, without relying on the cloud. However, practical realization of such on-device training is limited by the communication bottleneck, computation complexity of training deep models and significant data distribution skew across devices. Many feedback-based compression techniques have been proposed in the literature to reduce the communication cost and a few works propose algorithmic changes to aid the performance in the presence of skewed data distribution by improving convergence rate. To the best of our knowledge, there is no work in the literature that applies and shows compute efficient training techniques such quantization, pruning etc., for peer-to-peer decentralized learning setups. In this paper, we analyze and show the convergence of low precision decentralized training that aims to reduce computational complexity of training and inference. Further, We study the effect of degree of skew and communication compression on the low precision decentralized training over various computer vision and Natural Language Processing (NLP) tasks. Our experiments indicate that 8-bit decentralized training has minimal accuracy loss compared to its full precision counterpart even with heterogeneous data. However, when low precision training is accompanied by communication compression through sparsification we observe 1-2% drop in accuracy. The proposed low precision decentralized training decreases computational complexity, memory usage, and communication cost by ~4x while trading off less than a 1% accuracy for both IID and non-IID data. In particular, with higher skew values, we observe an increase in accuracy (by ~0.5%) with low precision training, indicating the regularization effect of the quantization.

Experiments

This repository currently contains experiments reported in the paper for Low precision CHOCO-SGD and Deep-Squeeze.

Datasets

CIFAR-10
CIFAR-100
Imagenette

Models

ResNet
VGG
MobileNet

sh run.sh

References

This code uses the Facebook's Stochastic Gradient Push Repository for building up the decentralized learning setup. We update the code base to include Deep-Squeeze, CHOCO-SGD, Quasi-Gobal Momentum and 8-bit integer training.

Citation

@inproceedings{
aketi2021,
title={Low Precision Decentralized Distributed Training with Heterogenous Data},
author={Sai Aparna Aketi, Sangamesh Kodge, and Kaushik Roy},
booktitle={arXiv pre-print},
year={2021},
url={https://arxiv.org/abs/2111.09389}
}

Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"

Related tags

Overview

Low Precision Decentralized Training with Heterogenous Data

Abstract

Experiments

Datasets

Models

References

Citation

Owner

Aparna Aketi

An Approach to Explore Logistic Regression Models

UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems

🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱

Exploiting Robust Unsupervised Video Person Re-identification

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!

Official PyTorch code for the paper: "Point-Based Modeling of Human Clothing" (ICCV 2021)

This repository is the official implementation of the Hybrid Self-Attention NEAT algorithm.

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

Weighted K Nearest Neighbors (kNN) algorithm implemented on python from scratch.

meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

Vision-and-Language Navigation in Continuous Environments using Habitat

The repository includes the code for training cell counting applications. (Keras + Tensorflow)

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Keras code and weights files for popular deep learning models.

High-Resolution 3D Human Digitization from A Single Image.

A PyTorch implementation of DenseNet.

Finding an Unsupervised Image Segmenter in each of your Deep Generative Models

This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization