SPT_LSA_ViT - Implementation for Visual Transformer for Small-size Datasets

Last update: Jan 01, 2023

Related tags

Deep Learning SPT_LSA_ViT

Overview

Vision Transformer for Small-Size Datasets

Seung Hoon Lee and Seunghyun Lee and Byung Cheol Song | Paper

Inha University

Abstract

Recently, the Vision Transformer (ViT), which applied the transformer structure to the image classification task, has outperformed convolutional neural networks. However, the high performance of the ViT results from pre-training using a large-size dataset such as JFT-300M, and its dependence on a large dataset is interpreted as due to low locality inductive bias. This paper proposes Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA), which effectively solve the lack of locality inductive bias and enable it to learn from scratch even on small-size datasets. Moreover, SPT and LSA are generic and effective add-on modules that are easily applicable to various ViTs. Experimental results show that when both SPT and LSA were applied to the ViTs, the performance improved by an average of 2.96% in Tiny-ImageNet, which is a representative small-size dataset. Especially, Swin Transformer achieved an overwhelming performance improvement of 4.08% thanks to the proposed SPT and LSA.

Method

Shifted Patch Tokenization

Locality Self-Attention

Model Performance

Small-Size Dataset Classification

Model	FLOPs	CIFAR10	CIFAR100	SVHN	Tiny-ImageNet
ViT	189.8	93.58	73.81	97.82	57.07
SL-ViT	199.2	94.53	76.92	97.79	61.07
T2T	643.0	95.30	77.00	97.90	60.57
SL-T2T	671.4	95.57	77.36	97.91	61.83
CaiT	613.8	94.91	76.89	98.13	64.37
SL-CaiT	623.3	95.81	80.32	98.28	67.18
PiT	279.2	94.24	74.99	97.83	60.25
SL-PiT	322.9	95.88	79.00	97.93	62.91
Swin	242.3	94.46	76.87	97.72	60.87
SL-Swin	284.9	95.93	79.99	97.92	64.95

Accuracy-Throughput Graph

How to train models

Pure ViT

python main.py --model vit

SL-Swin

python main.py --model swin --is_LSA --is_SPT

Citation

@misc{lee2021vision,
      title={Vision Transformer for Small-Size Datasets}, 
      author={Seung Hoon Lee and Seunghyun Lee and Byung Cheol Song},
      year={2021},
      eprint={2112.13492},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

SPT_LSA_ViT - Implementation for Visual Transformer for Small-size Datasets

Related tags

Overview

Vision Transformer for Small-Size Datasets

Abstract

Method

Shifted Patch Tokenization

Locality Self-Attention

Model Performance

Small-Size Dataset Classification

Accuracy-Throughput Graph

How to train models

Pure ViT

SL-Swin

Citation

Owner

Lee SeungHoon

SSD-based Object Detection in PyTorch

Elevation Mapping on GPU.

Pytorch implementation of "Neural Wireframe Renderer: Learning Wireframe to Image Translations"

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell. CVPR 2015 and PAMI 2016.

Reproduce partial features of DeePMD-kit using PyTorch.

9th place solution

📖 Deep Attentional Guided Image Filtering

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Code for Transformer Hawkes Process, ICML 2020.

NER for Indian languages

kapre: Keras Audio Preprocessors

Creating predictive checklists from data using integer programming.

This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.

Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge

A PyTorch Implementation of PGL-SUM from "Combining Global and Local Attention with Positional Encoding for Video Summarization", Proc. IEEE ISM 2021

PartImageNet is a large, high-quality dataset with part segmentation annotations

social humanoid robots with GPGPU and IoT

SPT_LSA_ViT - Implementation for Visual Transformer for Small-size Datasets

Related tags

Overview

Vision Transformer for Small-Size Datasets

Abstract

Method

Shifted Patch Tokenization

Locality Self-Attention

Model Performance

Small-Size Dataset Classification

Accuracy-Throughput Graph

How to train models

Pure ViT

SL-Swin

Citation

Owner

Lee SeungHoon

SSD-based Object Detection in PyTorch

Elevation Mapping on GPU.

Pytorch implementation of "Neural Wireframe Renderer: Learning Wireframe to Image Translations"

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long*, Evan Shelhamer*, and Trevor Darrell. CVPR 2015 and PAMI 2016.

Reproduce partial features of DeePMD-kit using PyTorch.

9th place solution

📖 Deep Attentional Guided Image Filtering

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Code for Transformer Hawkes Process, ICML 2020.

NER for Indian languages

kapre: Keras Audio Preprocessors

Creating predictive checklists from data using integer programming.

This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.

Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge

A PyTorch Implementation of PGL-SUM from "Combining Global and Local Attention with Positional Encoding for Video Summarization", Proc. IEEE ISM 2021

PartImageNet is a large, high-quality dataset with part segmentation annotations

social humanoid robots with GPGPU and IoT

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell. CVPR 2015 and PAMI 2016.