Dynamic Token Normalization Improves Vision Transformers

Last update: Oct 09, 2022

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

This is the PyTorch implementation of the paper Dynamic Token Normalization Improves Vision Transfromers. Codea and Models will be available soon.

Dynamic Token Normalization

We design a novel normalization method, termed Dynamic Token Normalization (DTN), which inherits the advantages from LayerNorm and InstanceNorm. DTN can be seamlessly plugged into various transformer models, consistenly improving the performance.

Comparisons of top-1 accuracies on the validation set of ImageNet, by using ViT trained with LN and DTN.

Model	Top-1	Top-5
ViT-T*-LN	72.3	91.4
ViT-T*-DTN	73.2	91.7
ViT-S*-LN	80.6	95.2
ViT-S*-DTN	81.7	95.8
ViT-B*-LN	81.7	95.8
ViT-B*-DTN	82.5	96.1

Getting Started

Install PyTorch

Clone the repo:

git clone https://github.com/dtn-anonymous/DTN.git

Requirements

Install CUDA==10.1 with cudnn7 following the official installation instructions
Install PyTorch==1.7.1 and torchvision==0.8.2 with CUDA==10.1:

conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch

Install timm==0.3.2:

pip install timm==0.3.2

Data Preparation

Download the ImageNet dataset which should contain train and val directionary and the txt file for correspondings between images and labels.

Training a model from scratch

An example to train our DTN is given in DTN/scripts/train.sh. To train ViT-S* with our DTN,

cd DTN/scripts   
sh train.sh layer vit_norm_s_star configs/ViT/vit.yaml

Number of GPUs and configuration file to use can be modified in train.sh

Dynamic Token Normalization Improves Vision Transformers

Related tags

Overview

Dynamic Token Normalization Improves Vision Transformers

Dynamic Token Normalization

Getting Started

Requirements

Data Preparation

Training a model from scratch

Owner

Wenqi Shao

BABEL: Bodies, Action and Behavior with English Labels [CVPR 2021]

PyTorch CZSL framework containing GQA, the open-world setting, and the CGE and CompCos methods.

Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Official repository for "On Generating Transferable Targeted Perturbations" (ICCV 2021)

Matching python environment code for Lux AI 2021 Kaggle competition, and a gym interface for RL models.

The repository offers the official implementation of our BMVC 2021 paper in PyTorch.

RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection

Keras-1D-ACGAN-Data-Augmentation

Fast, flexible and easy to use probabilistic modelling in Python.

Deep Learning tutorials in jupyter notebooks.

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Learning Visual Words for Weakly-Supervised Semantic Segmentation

Code for the paper titled "Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks" (NeurIPS 2021 Spotlight).

Facestar dataset. High quality audio-visual recordings of human conversational speech.

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

A mini lib that implements several useful functions binding to PyTorch in C++.

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

Code used for the results in the paper "ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning"

A PyTorch Library for Accelerating 3D Deep Learning Research

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch