Stochastic gradient descent with model building

Overview

Stochastic Model Building (SMB)

This repository includes a new fast and robust stochastic optimization algorithm for training deep learning models. The core idea of the algorithm is based on building models with local stochastic gradient information. The details of the algorithm is given in our recent paper.

SMB

Abstract

Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the stepsize. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates a second-order information that allows adjusting not only the stepsize but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in most problems. Moreover, our experiments show that the proposed method is quite robust as it converges for a wide range of initial stepsizes.

Keywords: model building; second-order information; stochastic gradient descent; convergence analysis

Installation

pip install git+https://github.com/sbirbil/SMB.git

Testing

Here is how you can use SMB:

import smb

optimizer = smb.SMB(model.parameters(), independent_batch=False) #independent_batch=True for SMBi optimizer

for epoch in range(100):
    
    # training steps
    model.train()
    
    for batch_index, (data, target) in enumerate(train_loader):
            
        # create loss closure for smb algorithm
        def closure():
            optimizer.zero_grad()
            loss = torch.nn.CrossEntropyLoss()(model(data), target)
            return loss
        
        # forward pass
        loss = optimizer.step(closure=closure)

You can also check our tutorial for a complete example (or the Colab notebook without installation). Set the hyper-parameter independent_batch to True in order to use the SMBi optimizer. Our paper includes more information.

Reproducing The Experiments

See the following script in order to reproduce the results in our paper.

Owner
S. Ilker Birbil
I am a faculty member working on data science and optimization.
S. Ilker Birbil
Label-Free Model Evaluation with Semi-Structured Dataset Representations

Label-Free Model Evaluation with Semi-Structured Dataset Representations Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch

8 Oct 06, 2022
Multi-Task Learning as a Bargaining Game

Nash-MTL Official implementation of "Multi-Task Learning as a Bargaining Game". Setup environment conda create -n nashmtl python=3.9.7 conda activate

Aviv Navon 87 Dec 26, 2022
Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021 Official Pytorch implementation of PCME | Paper Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de R

NAVER AI 87 Dec 21, 2022
XViT - Space-time Mixing Attention for Video Transformer

XViT - Space-time Mixing Attention for Video Transformer This is the official implementation of the XViT paper: @inproceedings{bulat2021space, title

Adrian Bulat 33 Dec 23, 2022
This repository contains the code for Direct Molecular Conformation Generation (DMCG).

Direct Molecular Conformation Generation This repository contains the code for Direct Molecular Conformation Generation (DMCG). Dataset Download rdkit

25 Dec 20, 2022
[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers Installation pip install -r requirements.txt Dataset Preparation Given the

Yingchen Yu 25 Nov 09, 2022
This repository is a basic Machine Learning train & validation Template (Using PyTorch)

pytorch_ml_template This repository is a basic Machine Learning train & validation Template (Using PyTorch) TODO Markdown 사용법 Build Docker 사용법 Anacond

1 Sep 15, 2022
GAN-based 3D human pose estimation model for 3DV'17 paper

Tensorflow implementation for 3DV 2017 conference paper "Adversarially Parameterized Optimization for 3D Human Pose Estimation". @inproceedings{jack20

Dominic Jack 15 Feb 27, 2021
SplineConv implementation for Paddle.

SplineConv implementation for Paddle This module implements the SplineConv operators from Matthias Fey, Jan Eric Lenssen, Frank Weichert, Heinrich Mül

北海若 3 Dec 29, 2021
The reference baseline of final exam for XMU machine learning course

Mini-NICO Baseline The baseline is a reference method for the final exam of machine learning course. Requirements Installation we use /python3.7 /torc

JoaquinChou 3 Dec 29, 2021
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

Rakshitha Godahewa 80 Dec 30, 2022
Baselines for TrajNet++

TrajNet++ : The Trajectory Forecasting Framework PyTorch implementation of Human Trajectory Forecasting in Crowds: A Deep Learning Perspective TrajNet

VITA lab at EPFL 183 Jan 05, 2023
Unofficial Implementation of MLP-Mixer, gMLP, resMLP, Vision Permutator, S2MLPv2, RaftMLP, ConvMLP, ConvMixer in Jittor and PyTorch.

Unofficial Implementation of MLP-Mixer, gMLP, resMLP, Vision Permutator, S2MLPv2, RaftMLP, ConvMLP, ConvMixer in Jittor and PyTorch! Now, Rearrange and Reduce in einops.layers.jittor are support!!

130 Jan 08, 2023
Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders

Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders

1 Oct 11, 2021
RLHive: a framework designed to facilitate research in reinforcement learning.

RLHive is a framework designed to facilitate research in reinforcement learning. It provides the components necessary to run a full RL experiment, for both single agent and multi agent environments.

88 Jan 05, 2023
Research into Forex price prediction from price history using Deep Sequence Modeling with Stacked LSTMs.

Forex Data Prediction via Recurrent Neural Network Deep Sequence Modeling Research Paper Our research paper can be viewed here Installation Clone the

Alex Taradachuk 2 Aug 07, 2022
Next-gen Rowhammer fuzzer that uses non-uniform, frequency-based patterns.

Blacksmith Rowhammer Fuzzer This repository provides the code accompanying the paper Blacksmith: Scalable Rowhammering in the Frequency Domain that is

Computer Security Group @ ETH Zurich 173 Nov 16, 2022
Autoregressive Models in PyTorch.

Autoregressive This repository contains all the necessary PyTorch code, tailored to my presentation, to train and generate data from WaveNet-like auto

Christoph Heindl 41 Oct 09, 2022
Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Price-Prediction-For-a-Dream-Home ROADMAP TO THIS LINEAR REGRESSION BASED HOUSE PRICE PREDICTION PREDICTION MODEL Import all the dependencies of the p

DIKSHA DESWAL 1 Dec 29, 2021
PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

Anti-Backdoor Learning PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data. Check the unlearning effect

Yige-Li 51 Dec 07, 2022