Sequencer: Deep LSTM for Image Classification

Last update: Dec 16, 2022

Related tags

Audio sequencer

Overview

Sequencer: Deep LSTM for Image Classification

Created by

This repository contains implementation for Sequencer.

Abstract

In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly revolutionized various architectural design efforts: ViT achieved state-of-the-art image classification performance using self-attention found in natural language processing, and MLP-Mixer achieved competitive performance using simple multi-layer perceptrons. In contrast, several studies have also suggested that carefully redesigned convolutional neural networks (CNNs) can achieve advanced performance comparable to ViT without resorting to these new ideas. Against this background, there is growing interest in what inductive bias is suitable for computer vision. Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues. Unlike ViTs, Sequencer models long-range dependencies using LSTMs rather than self-attention layers. We also propose a two-dimensional version of Sequencer module, where an LSTM is decomposed into vertical and horizontal LSTMs to enhance performance. Despite its simplicity, several experiments demonstrate that Sequencer performs impressively well: Sequencer2D-L, with 54M parameters, realizes 84.6% top-1 accuracy on only ImageNet-1K. Not only that, we show that it has good transferability and the robust resolution adaptability on double resolution-band.

Schematic diagrams

The overall architecture of Sequencer2D is similar to the typical hierarchical ViT and Visual MLP. It uses Sequencer2D blocks instead of Transformer blocks:

Sequencer2D block replaces the Transformer's self-attention layer with an LSTM-based layer like BiLSTM2D layer:

BiLSTM2D includes a vertical LSTM and a horizontal LSTM:

Model Zoo

We provide our Sequencer models pretrained on ImageNet-1K:

name	arch	Params	FLOPs	[email protected]	download
Sequencer2D-S	`sequencer2d_s`	28M	8.4G	82.3	here
Sequencer2D-M	`sequencer2d_m`	38M	11.1G	82.8	here
Sequencer2D-L	`sequencer2d_l`	54M	16.6G	83.4	here

Usage

Requirements

torch>=1.10.0
torchvision
timm==0.5.4
Pillow
matplotlib
scipy
etc., see requirements.txt

Data preparation

Download and extract ImageNet images. The directory structure should be as follows.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Traning

Command line for training Sequencer models on ImageNet from scratch.

./distributed_train.sh 8 /path/to/imagenet --model sequencer2d_s -b 256 -j 8 --opt adamw --epochs 300 --sched cosine --native-amp --img-size 224 --drop-path 0.1 --lr 2e-3 --weight-decay 0.05 --remode pixel --reprob 0.25 --aa rand-m9-mstd0.5-inc1 --smoothing 0.1 --mixup 0.8 --cutmix 1.0 --warmup-lr 1e-6 --warmup-epochs 20

Command line for fine-tuning a pre-trained model at higher resolution.

./distributed_train.sh 8 /path/to/imagenet --model sequencer2d_l --pretrained -b 64 -j 8 --opt adamw --epochs 30 --sched cosine --native-amp --input-size 3 392 392 --img-size 392 --crop-pct 1.0 --drop-path 0.4 --lr 5e-5 --weight-decay 1e-8 --remode pixel --reprob 0.25 --aa rand-m9-mstd0.5-inc1 --smoothing 0.1 --mixup 0.8 --cutmix 1.0 --warmup-epochs 0 --cooldown-epochs 0

Command line for fine-tuning a pre-trained model on a transfer learning dataset.

./distributed_train.sh 4 /path/to/cifar10 --model sequencer2d_s -b 128 -j 4 --num-classes 10 --dataset torch/cifar10 --pretrained --opt adamw --epochs 200 --sched cosine --native-amp --img-size 224 --clip-grad 1 --drop-path 0.1 --lr 0.0001 --weight-decay 1e-4 --remode pixel --aa rand-m9-mstd0.5-inc1 --smoothing 0.1 --mixup 0.8 --cutmix 1.0 --warmup-lr 1e-6 --warmup-epochs 5

Validation

To evaluate our Sequencer models, run:

python validate.py /path/to/imagenet --model sequencer2d_s -b 16 --input-size 3 224 224 --amp

Reference

You may want to cite:

@article{tatsunami2022sequencer,
  title={Sequencer: Deep LSTM for Image Classification},
  author={Tatsunami, Yuki and Taki, Masato},
  journal={arXiv preprint arXiv:2205.01972},
  year={2022}
}

Acknowledgment

This implementation is based on pytorch-image-models by Ross Wightman. We thank for his brilliant work.


We thank Graduate School of Artificial Intelligence and Science, Rikkyo University (Rikkyo AI) which supports us with computational resources, facilities, and others.
AnyTech Co. Ltd. provided valuable comments on the early versions and encouragement. We thank them for their cooperation. In particular, We thank Atsushi Fukuda for organizing discussion opportunities.

You might also like...

Simple-Image-Classification - Simple Image Classification Code (PyTorch)

Simple-Image-Classification Simple Image Classification Code (PyTorch) Yechan Kim This repository contains: Python3 / Pytorch code for multi-class ima

8 Oct 29, 2022

Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

0 Jan 23, 2022

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

195 Dec 7, 2022

Deep learning based hand gesture recognition using LSTM and MediaPipie.

Hand Gesture Recognition Deep learning based hand gesture recognition using LSTM and MediaPipie. Demo video using PingPong Robot Files Pretrained mode

24 Nov 11, 2022

Image Captioning using CNN ,LSTM and Attention

Image Captioning using CNN ,LSTM and Attention This is a deeplearning model which tries to summarize an image into a text . Installation Install this

1 Dec 16, 2021

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

2 Feb 10, 2022

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex

139 Jan 1, 2023

The official implementation of the IEEE S&P`22 paper "SoK: How Robust is Deep Neural Network Image Classification Watermarking".

Watermark-Robustness-Toolbox - Official PyTorch Implementation This repository contains the official PyTorch implementation of the following paper to

49 Dec 19, 2022

Comments

Did you also perform tests on other RNN models?

Hello, thank you for the interesting paper. I am wondering whether you also tried GRUs or regular RNNs? Were LSTMs always better than the other RNN models?
question

opened by lars-nieradzik 2

Sequencer: Deep LSTM for Image Classification

Related tags

Overview

Sequencer: Deep LSTM for Image Classification

Abstract

Schematic diagrams

Model Zoo

Usage

Requirements

Data preparation

Traning

Validation

Reference

Acknowledgment

You might also like...

Simple-Image-Classification - Simple Image Classification Code (PyTorch)

Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Deep learning based hand gesture recognition using LSTM and MediaPipie.

Image Captioning using CNN ,LSTM and Attention

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

The official implementation of the IEEE S&P`22 paper "SoK: How Robust is Deep Neural Network Image Classification Watermarking".

Automatic deep learning for image classification.

paper: Hyperspectral Remote Sensing Image Classification Using Deep Convolutional Capsule Network

Using LSTM write Tang poetry

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

CNN+LSTM+CTC based OCR implemented using tensorflow.

A small C++ implementation of LSTM networks, focused on OCR.

OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network

Using multidimensional LSTM neural networks to create a forecast for Bitcoin price

Multi-layer convolutional LSTM with Pytorch

Comments

Did you also perform tests on other RNN models?

Releases(weights)

weights(Apr 28, 2022)

Owner

Yuki Tatsunami

Muzic: Music Understanding and Generation with Artificial Intelligence

Enhanced Audio Player for Discord

F.R.I.D.A.Y. ----- Female Replacement Intelligent Digital Assistant Youth

Audio augmentations library for PyTorch for audio in the time-domain

SolidMusic rewrite version, need help

GNOME powered sound conversion

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Jarvis From Basic to Advance - make a voice assistant similar to JARVIS (in iron man movie)

Conferencing Speech Challenge

Hide Your Secret Message in any Wave Audio File.

The official repository for Audio ALBERT

Praat in Python, the Pythonic way

gentle forced aligner

Identify the emotion of multiple speakers in an Audio Segment

digital audio workstation, instrument and effect plugins, wave editor

Scalable audio processing framework written in Python with a RESTful API

Synchronize a local directory of songs' (MP3, MP4) metadata (genre, ratings) and playlists with a Plex server.

L-SpEx: Localized Target Speaker Extraction

Convert complex chord names to midi notes

Gradient - A Python program designed to create a reactive and ambient music listening experience