Secure Distributed Training at Scale

Related tags

Deep Learningbtard
Overview

Secure Distributed Training at Scale

This repository contains the implementation of experiments from the paper

"Secure Distributed Training at Scale"

Eduard Gorbunov*, Alexander Borzunov*, Michael Diskin, Max Ryabinin

[PDF] arxiv.org

Overview

The code is organized as follows:

  • ./resnet is a setup for training ResNet18 on CIFAR-10 with simulated byzantine attackers
  • ./albert runs distributed training of ALBERT-large with byzantine attacks using cloud instances

ResNet18

This setup uses torch.distributed for parallelism.

Requirements
  • Python >= 3.7 (we recommend Anaconda python 3.8)
  • Dependencies: pip install jupyter torch>=1.6.0 torchvision>=0.7.0 tensorboard
  • A machine with at least 16GB RAM and either a GPU with >24GB memory or 3 GPUs with at least 10GB memory each.
  • We tested the code on Ubuntu Server 18.04, it should work with all major linux distros. For Windows, we recommend using Docker (e.g. via Kitematic).

Running experiments: please open ./resnet/RunExperiments.ipynb and follow the instructions in that notebook. The learning curves will be available in Tensorboard logs: tensorboard --logdir btard/resnet.

ALBERT

This setup spawns distributed nodes that collectively train ALBERT-large on wikitext103. It uses a version of the hivemind library modified so that some peers may be programmed to become Byzantine and perform various types of attacks on the training process.

Requirements
  • The experiments are optimized for 16 instances each with a single T4 GPU.

    • For your convenience, we provide a cost-optimized AWS starter notebook that can run experiments (see below)
    • While it can be simulated with a single node, doing so will require additional tuning depending on the number and type of GPUs available.
  • If running manually, please install the core library on each machine:

    • The code requires python >= 3.7 (we recommend Anaconda python 3.8)
    • Install the library: cd ./albert/hivemind/ && pip install -e .
    • If successful, it should become available as import hivemind

Running experiments: For your convenience, we provide a unified script that runs a distributed ALBERT experiment in the AWS cloud ./albert/experiments/RunExperiments.ipynb using preemptible T4 instances. The learning curves will be posted to the Wandb project specified during the notebook setup.

Expected cloud costs: a training experiment with 16 hosts takes up approximately $60 per day for g4dn.xlarge and $90 per day for g4dn.2xlarge instances. One can expect a full training experiment to converge in ≈3 days. Once the model is trained, one can restart training from intermediate checkpoints and simulate attacks. One attack episode takes up 4-5 hours depending on cloud availability.

Owner
Yandex Research
Yandex Research
This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression This repository contains the code for the paper in EM

Chenhe Dong 2 Mar 24, 2022
Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021)

Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021) Alexey Nekrasov*, Jonas Schult*, Or Litany, Bastian Leibe, Francis Engelmann Mix3D is

Alexey Nekrasov 189 Dec 26, 2022
My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

Deep Q&A Table of Contents Presentation Installation Running Chatbot Web interface Results Pretrained model Improvements Upgrade Presentation This wor

Conchylicultor 2.9k Dec 28, 2022
Python Implementation of the CoronaWarnApp (CWA) Event Registration

Python implementation of the Corona-Warn-App (CWA) Event Registration This is an implementation of the Protocol used to generate event and location QR

MaZderMind 17 Oct 05, 2022
Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

Deep Causal Reasoning for Recommender Systems The codes are associated with the following paper: Deep Causal Reasoning for Recommendations, Yaochen Zh

Yaochen Zhu 22 Oct 15, 2022
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations This repo contains the Pytorch implementation of our paper: Revisit

Wouter Van Gansbeke 80 Nov 20, 2022
Python port of R's Comprehensive Dynamic Time Warp algorithm package

Welcome to the dtw-python package Comprehensive implementation of Dynamic Time Warping algorithms. DTW is a family of algorithms which compute the loc

Dynamic Time Warping algorithms 154 Dec 26, 2022
Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks

Introduction This repository contains the modified caffe library and network architectures for our paper "Automated Melanoma Recognition in Dermoscopy

Lequan Yu 47 Nov 24, 2022
GeDML is an easy-to-use generalized deep metric learning library

GeDML is an easy-to-use generalized deep metric learning library

Borui Zhang 32 Dec 05, 2022
Learning Dense Representations of Phrases at Scale (Lee et al., 2020)

DensePhrases DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches th

Princeton Natural Language Processing 540 Dec 30, 2022
Robust Self-augmentation for NER with Meta-reweighting

Robust Self-augmentation for NER with Meta-reweighting

Lam chi 17 Nov 22, 2022
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding This repo contains the data and source code for baseline models in the NeurIPS 2

Microsoft 29 Dec 29, 2022
Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

PLOP: Learning without Forgetting for Continual Semantic Segmentation This repository contains all of our code. It is a modified version of Cermelli e

Arthur Douillard 116 Dec 14, 2022
The implementation of the lifelong infinite mixture model

Lifelong infinite mixture model 📋 This is the implementation of the Lifelong infinite mixture model 📋 Accepted by ICCV 2021 Title : Lifelong Infinit

Fei Ye 5 Oct 20, 2022
Continual learning with sketched Jacobian approximations

Continual learning with sketched Jacobian approximations This repository contains the code for reproducing figures and results in the paper ``Provable

Machine Learning and Information Processing Laboratory 1 Jun 30, 2022
Implementation of Multistream Transformers in Pytorch

Multistream Transformers Implementation of Multistream Transformers in Pytorch. This repository deviates slightly from the paper, where instead of usi

Phil Wang 47 Jul 26, 2022
Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?

Adversrial Machine Learning Benchmarks This code belongs to the papers: Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness? Det

Adversarial Machine Learning 9 Nov 27, 2022
HGCN: Harmonic Gated Compensation Network For Speech Enhancement

HGCN The official repo of "HGCN: Harmonic Gated Compensation Network For Speech Enhancement", which was accepted at ICASSP2022. How to use step1: Calc

ScorpioMiku 33 Nov 14, 2022
Balancing Principle for Unsupervised Domain Adaptation

Blancing Principle for Domain Adaptation NeurIPS 2021 Paper Abstract We address the unsolved algorithm design problem of choosing a justified regulari

Marius-Constantin Dinu 4 Dec 15, 2022
Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

this is a simple artificial neural network model using deep learning and torch-audio to classify cats and dog sounds.

crispengari 3 Dec 05, 2022