This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Last update: Dec 13, 2022

Related tags

Overview

ResT

By Qing-Long Zhang and Yu-Bin Yang

[State Key Laboratory for Novel Software Technology at Nanjing University]

This repo is the official implementation of "ResT: An Efficient Transformer for Visual Recognition". It currently includes code and models for the following tasks:

Image Classification: Included in this repo. See get_started.md for a quick start.

Object Detection and Instance Segmentation: Based on detectron2, coming soon.

ResT is initially described in arxiv, which capably serves as a general-purpose backbone for computer vision. It can tackle input images with arbitrary size. Besides, ResT compressed the memory of standard MSA and model the interaction between multi-heads while keeping the diversity ability.

Main Results on ImageNet with Pretrained Models

ImageNet-1K Pretrained Models

name	resolution	[email protected]	[email protected]	#params	FLOPs	FPS	1K model
ResT-Lite	224x224	77.2	93.7	10.5M	1.4G	1246	baidu
ResT-Small	224x224	79.6	94.9	13.7M	1.9G	1043	baidu
ResT-Base	224x224	81.6	95.7	30.3M	4.3G	673	baidu
ResT-Large	224x224	83.6	96.3	51.6M	7.9G	429	baidu

Note: access code for baidu is rest.

Citing ResT

@article{zhql2021ResT,
  title={ResT: An Efficient Transformer for Visual Recognition},
  author={Zhang, Qinglong and Yang, Yubin},
  journal={arXiv preprint arXiv:2105.13677v2},
  year={2021}
}

This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Related tags

Overview

ResT

Main Results on ImageNet with Pretrained Models

Citing ResT

Owner

zhql

An NVDA add-on to split screen reader and audio from other programs to different sound channels

Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

Bilinear attention networks for visual question answering

A PyTorch implementation for Unsupervised Domain Adaptation by Backpropagation(DANN), support Office-31 and Office-Home dataset

Entity-Based Knowledge Conflicts in Question Answering.

PyTorch implementation of our ICCV 2021 paper Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer.

text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

ICNet and PSPNet-50 in Tensorflow for real-time semantic segmentation

Benchmark for evaluating open-ended generation

AutoML library for deep learning

Rethinking Portrait Matting with Privacy Preserving

Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation

CvT-ASSD: Convolutional vision-Transformerbased Attentive Single Shot MultiBox Detector (ICTAI 2021 CCF-C 会议)The 33rd IEEE International Conference on Tools with Artificial Intelligence

ROS support for Velodyne 3D LIDARs

A learning-based data collection tool for human segmentation

A Keras implementation of YOLOv4 (Tensorflow backend)

Supervised & unsupervised machine-learning techniques are applied to the database of weighted P4s which admit Calabi-Yau hypersurfaces.

This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.