Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Last update: Dec 23, 2022

Overview

Attention Transfer

PyTorch code for "Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer" https://arxiv.org/abs/1612.03928
Conference paper at ICLR2017: https://openreview.net/forum?id=Sks9_ajex

What's in this repo so far:

Activation-based AT code for CIFAR-10 experiments
Code for ImageNet experiments (ResNet-18-ResNet-34 student-teacher)
Jupyter notebook to visualize attention maps of ResNet-34 visualize-attention.ipynb

Coming:

grad-based AT
Scenes and CUB activation-based AT code

The code uses PyTorch https://pytorch.org. Note that the original experiments were done using torch-autograd, we have so far validated that CIFAR-10 experiments are exactly reproducible in PyTorch, and are in process of doing so for ImageNet (results are very slightly worse in PyTorch, due to hyperparameters).

bibtex:

@inproceedings{Zagoruyko2017AT,
    author = {Sergey Zagoruyko and Nikos Komodakis},
    title = {Paying More Attention to Attention: Improving the Performance of
             Convolutional Neural Networks via Attention Transfer},
    booktitle = {ICLR},
    url = {https://arxiv.org/abs/1612.03928},
    year = {2017}}

Requirements

First install PyTorch, then install torchnet:

pip install git+https://github.com/pytorch/[email protected]

then install other Python packages:

pip install -r requirements.txt

Experiments

CIFAR-10

This section describes how to get the results in the table 1 of the paper.

First, train teachers:

python cifar.py --save logs/resnet_40_1_teacher --depth 40 --width 1
python cifar.py --save logs/resnet_16_2_teacher --depth 16 --width 2
python cifar.py --save logs/resnet_40_2_teacher --depth 40 --width 2

To train with activation-based AT do:

python cifar.py --save logs/at_16_1_16_2 --teacher_id resnet_16_2_teacher --beta 1e+3

To train with KD:

python cifar.py --save logs/kd_16_1_16_2 --teacher_id resnet_16_2_teacher --alpha 0.9

We plan to add AT+KD with decaying beta to get the best knowledge transfer results soon.

ImageNet

Pretrained model

We provide ResNet-18 pretrained model with activation based AT:

Model	val error
ResNet-18	30.4, 10.8
ResNet-18-ResNet-34-AT	29.3, 10.0

Download link: https://s3.amazonaws.com/modelzoo-networks/resnet-18-at-export.pth

Model definition: https://github.com/szagoruyko/functional-zoo/blob/master/resnet-18-at-export.ipynb

Convergence plot:

Train from scratch

Download pretrained weights for ResNet-34 (see also functional-zoo for more information):

wget https://s3.amazonaws.com/modelzoo-networks/resnet-34-export.pth

Prepare the data following fb.resnet.torch and run training (e.g. using 2 GPUs):

python imagenet.py --imagenetpath ~/ILSVRC2012 --depth 18 --width 1 \
                   --teacher_params resnet-34-export.hkl --gpu_id 0,1 --ngpu 2 \
                   --beta 1e+3

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Related tags

Overview

Attention Transfer

Requirements

Experiments

CIFAR-10

ImageNet

Pretrained model

Train from scratch

Owner

Sergey Zagoruyko

Source Code of NeurIPS21 paper: Recognizing Vector Graphics without Rasterization

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Objax Apache-2Objax (🥉19 · ⭐ 580) - Objax is a machine learning framework that provides an Object.. Apache-2 jax

[ArXiv 2021] One-Shot Generative Domain Adaptation

Python framework for Stochastic Differential Equations modeling

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency

SeisComP/SeisBench interface to enable deep-learning (re)picking in SeisComP

Data and code for ICCV 2021 paper Distant Supervision for Scene Graph Generation.

This is a five-step framework for the development of intrusion detection systems (IDS) using machine learning (ML) considering model realization, and performance evaluation.

Pytorch library for fast transformer implementations

CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack

This repo will contain code to reproduce and build upon understanding transfer learning

Surrogate- and Invariance-Boosted Contrastive Learning (SIB-CL)

CTF challenges and write-ups for MicroCTF 2021.

Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning

Brain tumor detection using CNN (InceptionResNetV2 Model)

Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)