An optimizer that trains as fast as Adam and as good as SGD.

Last update: Dec 27, 2022

Related tags

Overview

AdaBound

An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popular tasks in the field of CV, NLP, and etc.

Based on Luo et al. (2019). Adaptive Gradient Methods with Dynamic Bound of Learning Rate. In Proc. of ICLR 2019.

Quick Links

Installation

AdaBound requires Python 3.6.0 or later. We currently provide PyTorch version and AdaBound for TensorFlow is coming soon.

Installing via pip

The preferred way to install AdaBound is via pip with a virtual environment. Just run

pip install adabound

in your Python environment and you are ready to go!

Using source code

As AdaBound is a Python class with only 100+ lines, an alternative way is directly downloading adabound.py and copying it to your project.

Usage

You can use AdaBound just like any other PyTorch optimizers.

optimizer = adabound.AdaBound(model.parameters(), lr=1e-3, final_lr=0.1)

As described in the paper, AdaBound is an optimizer that behaves like Adam at the beginning of training, and gradually transforms to SGD at the end. The final_lr parameter indicates AdaBound would transforms to an SGD with this learning rate. In common cases, a default final learning rate of 0.1 can achieve relatively good and stable results on unseen data. It is not very sensitive to its hyperparameters. See Appendix G of the paper for more details.

Despite of its robust performance, we still have to state that, there is no silver bullet. It does not mean that you will be free from tuning hyperparameters once using AdaBound. The performance of a model depends on so many things including the task, the model structure, the distribution of data, and etc. You still need to decide what hyperparameters to use based on your specific situation, but you may probably use much less time than before!

Demos

Thanks to the awesome work by the GitHub team and the Jupyter team, the Jupyter notebook (.ipynb) files can render directly on GitHub. We provide several notebooks (like this one) for better visualization. We hope to illustrate the robust performance of AdaBound through these examples.

For the full list of demos, please refer to this page.

Citing

If you use AdaBound in your research, please cite Adaptive Gradient Methods with Dynamic Bound of Learning Rate.

@inproceedings{Luo2019AdaBound,
  author = {Luo, Liangchen and Xiong, Yuanhao and Liu, Yan and Sun, Xu},
  title = {Adaptive Gradient Methods with Dynamic Bound of Learning Rate},
  booktitle = {Proceedings of the 7th International Conference on Learning Representations},
  month = {May},
  year = {2019},
  address = {New Orleans, Louisiana}
}

Contributors

@kayuksel

License

Apache 2.0

Comments

What is up with Epoch 150

I'm wondering what is happening at epoch 150 in all visualizations? I would like to introduce that into all my models ;-)

https://github.com/Luolc/AdaBound/blob/master/demos/cifar10/visualization.ipynb

opened by kootenpv 8
AdaBoundW

An AdaBound version with decoupled weight decay, which has been implemented to the code as an additional class, as it has been discussed in the recent issue #13.

opened by kayuksel 3
Question about the code

IIRC, because group['lr'] will never be changed, so finalr_lr will always be the same as group['final_lr']. Is this intended? https://github.com/Luolc/AdaBound/blob/6fa826003f41a57501bde3e2baab1488410fe2da/adabound/adabound.py#L110

opened by crcrpar 2
Don't work properly with higher lr

I'm new in deep learning and I found the project works well with SGD but turns to be sth wrong with adabound.

When I start with lr=1e-3, it shows as below and break down: invalid argument 2: non-empty 3D or 4D (batch mode) tensor expected for input, but got: [1 x 64 x 0 x 27] at /pytorch/aten/src/THCUNN/generic/SpatialAdaptiveMaxPooling.cu:24

But seems to work right if I set lr to 1e-4 or lower. It confused me a lot. Any ideas?

python=3.6 pytorch=1.0.1 / 0.4

opened by Ocelot7777 0
Can this deal with complex numbers?

Hi authors,

I intended to use this method on complex numbers and it turned out with a error message like:

File "optimizer.py", line 701, in step step_size.div_(denom).clamp_(lower_bound, upper_bound).mul_( RuntimeError: "clamp_scalar_cpu" not implemented for 'ComplexFloat'

I'm wondering if it's possible to improve this for complex numbers? Thanks.

Ni

opened by ni-chen 0
When did the optimizer switch to SGD？

I set the initial lr=0.0001, final_lr=0.1, but I still don't know when the optimizer will become SGD. Do I need to improve my learning rate to the final learning rate manually? thanks！

opened by yunbujian 0

Pytorch 1.6 warning

/home/xxxx/.local/lib/python3.7/site-packages/adabound/adabound.py:94: UserWarning: This overload of add_ is deprecated:
        add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
        add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
  exp_avg.mul_(beta1).add_(1 - beta1, grad)

opened by MichaelMonashev 1

Learning rate changing

Hi， thanks a lot for sharing your excellent work.

I wonder if I want to change learning rate with epoch increasing, how do I set parameter lr and final_lr in adamnboound ? Or is there any need changing learining rate with epoch increasing?

Looking for your reply, thanks a lot.

opened by EddieEduardo 0
LSTM hyparameters for language modeling

Greetings,

Thanks for your great paper. I am wondering about the hyperparameters you used for language modeling experiments. Could you provide information about that?

Thank you!

opened by hoangcuong2011 0

Releases(v0.0.5)

v0.0.5(Mar 6, 2019)
Bug Fixes

Fix wrong assertion of final_lr 02e11bae10c82f6b5365f7925c8cf71252adcd52

Fix .gitignore in CIFAR-10 demo to include the learning curve data 54ef9aa6c133caf0d9c82198d46979cfdbbb12f6

Source code(tar.gz)
Source code(zip)

Owner

LoLo

A fool living in the amazing world.

GitHub Repository https://www.luolc.com/publications/adabound/

PyTorch Extension Library of Optimized Scatter Operations

PyTorch Scatter Documentation This package consists of a small extension library of highly optimized sparse update (scatter and segment) operations fo

1.2k Jan 07, 2023

PyTorch toolkit for biomedical imaging

farabio is a minimal PyTorch toolkit for out-of-the-box deep learning support in biomedical imaging. For further information, see Wikis and Docs.

47 Dec 28, 2022

Model summary in PyTorch similar to `model.summary()` in Keras

Keras style model.summary() in PyTorch Keras has a neat API to view the visualization of the model which is very helpful while debugging your network.

3.7k Dec 29, 2022

A tutorial on "Bayesian Compression for Deep Learning" published at NIPS (2017).

Code release for "Bayesian Compression for Deep Learning" In "Bayesian Compression for Deep Learning" we adopt a Bayesian view for the compression of

190 Dec 30, 2022

Implements pytorch code for the Accelerated SGD algorithm.

AccSGD This is the code associated with Accelerated SGD algorithm used in the paper On the insufficiency of existing momentum schemes for Stochastic O

205 Jan 02, 2023

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

836 Dec 26, 2022

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

micrograd A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural

3.5k Jan 08, 2023

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

1.5k Jan 03, 2023

A code copied from google-research which named motion-imitation was rewrited with PyTorch

motor-system Introduction A code copied from google-research which named motion-imitation was rewrited with PyTorch. More details can get from this pr

6 Jan 08, 2022

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

PyTorch Implementation of Differentiable ODE Solvers This library provides ordinary differential equation (ODE) solvers implemented in PyTorch. Backpr

4.4k Jan 04, 2023

An optimizer that trains as fast as Adam and as good as SGD.

AdaBound An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popula

2.9k Dec 27, 2022

S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.

138 Jan 03, 2023

Official implementations of EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis.

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis This repo contains the official implementations of EigenDamage: Structured Prunin

107 Apr 20, 2022

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

12 Dec 19, 2021

An optimizer that trains as fast as Adam and as good as SGD.

Related tags

Overview

AdaBound

Quick Links

Installation

Installing via pip

Using source code

Usage

Demos

Citing

Contributors

License

Comments

Releases(v0.0.5)

v0.0.5(Mar 6, 2019)

Bug Fixes

Owner

LoLo

PyTorch Extension Library of Optimized Scatter Operations

PyTorch toolkit for biomedical imaging

Model summary in PyTorch similar to `model.summary()` in Keras

A tutorial on "Bayesian Compression for Deep Learning" published at NIPS (2017).

Implements pytorch code for the Accelerated SGD algorithm.

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

A code copied from google-research which named motion-imitation was rewrited with PyTorch

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

An optimizer that trains as fast as Adam and as good as SGD.

S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.

Official implementations of EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis.

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

An implementation of Performer, a linear attention-based transformer, in Pytorch

A very simple and small path tracer written in pytorch meant to be run on the GPU

High-level batteries-included neural network training library for Pytorch

Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.

3D-RETR: End-to-End Single and Multi-View3D Reconstruction with Transformers

PyTorch implementation of Glow, Generative Flow with Invertible 1x1 Convolutions