CosineAnnealingWithWarmup

Formulation

The learning rate is annealed using a cosine schedule over the course of learning of n_total total steps with an initial warmup period of n_warmup steps. Hence, the learning rate at step i is computed as:

Learning rate will be changed as:

Usage

# optimizer, warmup_epochs, warmup_lr, num_epochs, base_lr, final_lr, iter_per_epoch
lr_scheduler = LR_Scheduler(
        optimizer,
        args.warmup_epochs, args.warmup_lr*args.batch_size/256, 
        args.epochs, args.lr*args.batch_size/256, args.final_lr*args.batch_size/256, 
        len(train_loader),
    )

for data in range(train_loader):
  optimizer.zero_grad()
  output = model(data) 
  loss = lossfunc(output,gt)
  loss.backward()
  optimizer.step()
  lr_scheduler.step()

In CV domain [1,2], in order to automatically adapt different batch size you can use a learning rate of lr×BatchSize/256 (linear scaling [4])(we can use larger learning rate while adopting larger batch size, especially, when you use LARS optimizer[3]). Of course, you can modify it according to your specific requirements.

Reference

[1] Jean-Bastien Grill, Florian Strub, Florent Altch´e, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Do- ersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Moham- mad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, R´emi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning. arXiv:2006.07733v1, 2020.
[2] Chen X, He K. Exploring simple siamese representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 15750-15758.
[3] Yang You, Igor Gitman, and Boris Ginsburg. Large batch training of convolutional networks. arXiv:1708.03888, 2017.
[4] Priya Goyal, Piotr Doll´ar, Ross Girshick, Pieter Noord- huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv:1706.02677, 2017.
[5] https://github.com/PatrickHua/SimSiam?utm_source=catalyzex.com

Cosine Annealing With Warmup

Related tags

Overview

CosineAnnealingWithWarmup

Formulation

Usage

Reference

Owner

zhuyun

Computational inteligence project on faces in the wild dataset

Object-Centric Learning with Slot Attention

PyTorch Implementation of Exploring Explicit Domain Supervision for Latent Space Disentanglement in Unpaired Image-to-Image Translation.

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

Patch2Pix: Epipolar-Guided Pixel-Level Correspondences [CVPR2021]

Low Complexity Channel estimation with Neural Network Solutions

Code corresponding to The Introspective Agent: Interdependence of Strategy, Physiology, and Sensing for Embodied Agents

The authors' official PyTorch SigWGAN implementation

Cortex-compatible model server for Python and TensorFlow

Easy-to-use,Modular and Extendible package of deep-learning based CTR models .

Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

Multi-Modal Machine Learning toolkit based on PyTorch.

Pointer networks Tensorflow2

Code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance

Python script to download the celebA-HQ dataset from google drive

Target Propagation via Regularized Inversion

gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions

Portfolio Optimization and Quantitative Strategic Asset Allocation in Python

Normal Learning in Videos with Attention Prototype Network