SOTA model in CIFAR10

Last update: Dec 21, 2022

Related tags

Deep Learning pytorch-cifar-tricks

Overview

A PyTorch Implementation of CIFAR Tricks

调研了CIFAR10数据集上各种trick，数据增强，正则化方法，并进行了实现。目前项目告一段落，如果有更好的想法，或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。

0. Requirements

Python 3.6+
torch=1.8.0+cu111
torchvision+0.9.0+cu111
tqdm=4.26.0
PyYAML=6.0

1. Implements

1.1 Tricks

Warmup
Cosine LR Decay
SAM
Label Smooth
KD
Adabound
Xavier Kaiming init
lr finder

1.2 Augmentation

Auto Augmentation
Cutout
Mixup
RICAP
Random Erase
ShakeDrop

2. Training

2.1 CIFAR-10训练示例

WideResNet28-10 baseline on CIFAR-10:

python train.py --dataset cifar10

WideResNet28-10 +RICAP on CIFAR-10:

python train.py --dataset cifar10 --ricap True

WideResNet28-10 +Random Erasing on CIFAR-10:

python train.py --dataset cifar10 --random-erase True

WideResNet28-10 +Mixup on CIFAR-10:

python train.py --dataset cifar10 --mixup True

3. Results

3.1 原pytorch-ricap的结果

Model	Error rate	Loss	Error rate (paper)
WideResNet28-10 baseline	3.82（96.18）	0.158	3.89
WideResNet28-10 +RICAP	2.82（97.18）	0.141	2.85
WideResNet28-10 +Random Erasing	3.18（96.82）	0.114	4.65
WideResNet28-10 +Mixup	3.02（96.98）	0.158	3.02

3.2 Reimplementation结果

Model	Error rate	Loss	Error rate (paper)
WideResNet28-10 baseline	3.78（96.22）		3.89
WideResNet28-10 +RICAP	2.81（97.19）		2.85
WideResNet28-10 +Random Erasing	3.03（96.97）	0.113	4.65
WideResNet28-10 +Mixup	2.93（97.07）	0.158	3.02

3.3 Half data快速训练验证各网络结构

reimplementation models(no augmentation, half data，epoch200，bs128)

Model	Error rate	Loss
lenet(cpu爆炸)	（70.76）
wideresnet	3.78（96.22）
resnet20	（89.72）
senet	（92.34）
resnet18	（92.08）
resnet34	（92.48）
resnet50	（91.72）
regnet	（92.58）
nasnet	out of mem
shake_resnet26_2x32d	（93.06）
shake_resnet26_2x64d	（94.14）
densenet	（92.06）
dla	（92.58）
googlenet	（91.90）	0.2675
efficientnetb0(利用率低且慢)	（86.82）	0.5024
mobilenet(利用率低)	（89.18）
mobilenetv2	（91.06）
pnasnet	（90.44）
preact_resnet	（90.76）
resnext	（92.30）
vgg(cpugpu利用率都高)	（88.38）
inceptionv3	（91.84）
inceptionv4	（91.10）
inception_resnet_v2	（83.46）
rir	（92.34）	0.3932
squeezenet(CPU利用率高)	（89.16）	0.4311
stochastic_depth_resnet18	（90.22）
xception
dpn	（92.06）	0.3002
ge_resnext29_8x64d	（93.86）	巨慢

3.4 测试cpu gpu影响

TEST: scale/kernel ToyNet

修改网络的卷积层深度，并进行训练，可以得到以下结论：

结论：lenet这种卷积量比较少，只有两层的，cpu利用率高，gpu利用率低。在这个基础上增加深度，用vgg那种直筒方式增加深度，发现深度越深，cpu利用率越低，gpu利用率越高。

修改训练过程的batch size，可以得到以下结论：

结论：bs会影响收敛效果。

3.5 StepLR优化下测试cutout和mixup

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	200			96.33
shake_resnet26_2x64d	200	√		96.99
shake_resnet26_2x64d	200		√	96.60
shake_resnet26_2x64d	200	√	√	96.46

3.6 测试SAM,ASAM,Cosine,LabelSmooth

architecture	epoch	SAM	ASAM	Cosine LR Decay	LabelSmooth	C10 test acc (%)
shake_resnet26_2x64d	200	√				96.51
shake_resnet26_2x64d	200		√			96.80
shake_resnet26_2x64d	200			√		96.61
shake_resnet26_2x64d	200				√	96.57

PS:其他库在加长训练过程（epoch=1800）情况下可以实现 shake_resnet26_2x64d achieved 97.71% test accuracy with cutout and mixup!!

3.7 测试cosine lr + shake

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	300			96.66
shake_resnet26_2x64d	300	√		97.21
shake_resnet26_2x64d	300		√	96.90
shake_resnet26_2x64d	300	√	√	96.73

1800 epoch CIFAR ZOO中结果，由于耗时过久，未进行复现。

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	1800			96.94（cifar zoo）
shake_resnet26_2x64d	1800	√		97.20（cifar zoo）
shake_resnet26_2x64d	1800		√	97.42（cifar zoo）
shake_resnet26_2x64d	1800	√	√	97.71（cifar zoo）

3.8 Divide and Co-training方案研究

lr:
- warmup (20 epoch)
- cosine lr decay
- lr=0.1
- total epoch(300 epoch)
bs=128
aug:
- Random Crop and resize
- Random left-right flipping
- AutoAugment
- Normalization
- Random Erasing
- Mixup
weight decay=5e-4 (bias and bn undecayed)
kaiming weight init
optimizer: nesterov

复现：((v100:gpu1) 4min*300/60=20h) top1: 97.59% 本项目目前最高值。

python train.py --model 'pyramidnet272' \
                --name 'divide-co-train' \
                --autoaugmentation True \ 
                --random-erase True \
                --mixup True \
                --epochs 300 \
                --sched 'warmcosine' \
                --optims 'nesterov' \
                --bs 128 \
                --root '/home/dpj/project/data'

3.9 测试多种数据增强

architecture	epoch	cutout	mixup	autoaugment	random-erase	C10 test acc (%)
shake_resnet26_2x64d	200					96.42
shake_resnet26_2x64d	200	√				96.49
shake_resnet26_2x64d	200		√			96.17
shake_resnet26_2x64d	200			√		96.25
shake_resnet26_2x64d	200				√	96.20
shake_resnet26_2x64d	200	√	√			95.82
shake_resnet26_2x64d	200	√		√		96.02
shake_resnet26_2x64d	200	√			√	96.00
shake_resnet26_2x64d	200		√	√		95.83
shake_resnet26_2x64d	200		√		√	95.89
shake_resnet26_2x64d	200			√	√	96.25

python train.py --model 'shake_resnet26_2x64d' --name 'ss64_orgin' --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_c' --cutout True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_m' --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_a' --autoaugmentation True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_r' --random-erase True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cm'  --cutout True --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ca' --cutout True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cr' --cutout True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ma' --mixup True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_mr' --mixup True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ar' --autoaugmentation True --random-erase True  --bs 64

4. Reference

[1] https://github.com/BIGBALLON/CIFAR-ZOO

[2] https://github.com/pprp/MutableNAS

[3] https://github.com/clovaai/CutMix-PyTorch

[4] https://github.com/4uiiurz1/pytorch-ricap

[5] https://github.com/NUDTNASLab/pytorch-image-models

[6] https://github.com/facebookresearch/LaMCTS

[7] https://github.com/Alibaba-MIIL/ImageNet21K

SOTA model in CIFAR10

Related tags

Overview

A PyTorch Implementation of CIFAR Tricks

0. Requirements

1. Implements

1.1 Tricks

1.2 Augmentation

2. Training

2.1 CIFAR-10训练示例

3. Results

3.1 原pytorch-ricap的结果

3.2 Reimplementation结果

3.3 Half data快速训练验证各网络结构

3.4 测试cpu gpu影响

3.5 StepLR优化下测试cutout和mixup

3.6 测试SAM,ASAM,Cosine,LabelSmooth

3.7 测试cosine lr + shake

3.8 Divide and Co-training方案研究

3.9 测试多种数据增强

4. Reference

Owner

PJDong

Code and project page for ICCV 2021 paper "DisUnknown: Distilling Unknown Factors for Disentanglement Learning"

Single/multi view image(s) to voxel reconstruction using a recurrent neural network

Semantic Segmentation Suite in TensorFlow

Code for project: "Learning to Minimize Remainder in Supervised Learning".

Source code for paper "Deep Diffusion Models for Robust Channel Estimation", TBA.

[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

Implementation of [Time in a Box: Advancing Knowledge Graph Completion with Temporal Scopes].

Hand Gesture Volume Control | Open CV | Computer Vision

Generalized and Efficient Blackbox Optimization System.

A Partition Filter Network for Joint Entity and Relation Extraction EMNLP 2021

Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss （ATVGnet）

Demos of essentia classifiers hosted on replicate.ai

Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models

Bayesian algorithm execution (BAX)

Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Make your own game in a font!

Styled Handwritten Text Generation with Transformers (ICCV 21)

Code for "The Box Size Confidence Bias Harms Your Object Detector"

Task-based end-to-end model learning in stochastic optimization