"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Last update: Dec 06, 2022

Overview

FGVC8

Exploring Vision Transformers for Fine-grained Classification paper presented at the CVPR 2021, The Eight Workshop on Fine-Grained Visual Categorization on June 25th.

Abstract

Existing computer vision research in categorization struggles with fine-grained attributes recognition due to the inherently high intra-class variances and low inter-class variances. SOTA methods tackle this challenge by locating the most informative image regions and rely on them to classify the complete image. The most recent work, Vision Transformer (ViT), shows its strong performance in both traditional and fine-grained classification tasks.

In this work, we propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes using the inherent multi-head self-attention mechanism. We also introduce attention-guided augmentations for improving the model's capabilities.

We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology. We also prove our model's interpretability via qualitative results.

Instructions

Upcoming

Citation

If you find interesting our results, or you use or code/ideas please consider to cite our work:

@misc{conde2021exploring,
      title={Exploring Vision Transformers for Fine-grained Classification}, 
      author={Marcos V. Conde and Kerem Turgutlu},
      year={2021},
      eprint={2106.10587},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Related tags

Overview

FGVC8

Abstract

Instructions

Citation

References

Owner

Marcos V. Conde

A Temporal Extension Library for PyTorch Geometric

Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Registration Loss Learning for Deep Probabilistic Point Set Registration

AAAI-22 paper: SimSR: Simple Distance-based State Representationfor Deep Reinforcement Learning

Mapping Conditional Distributions for Domain Adaptation Under Generalized Target Shift

Lux AI environment interface for RLlib multi-agents

Training Cifar-10 Classifier Using VGG16

Tools for investing in Python

Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

Official git for "CTAB-GAN: Effective Table Data Synthesizing"

Train the HRNet model on ImageNet

Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

Official code for MPG2: Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

Official implementation for NIPS'17 paper: PredRNN: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal LSTMs.

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

Julia package for contraction of tensor networks, based on the sweep line algorithm outlined in the paper General tensor network decoding of 2D Pauli codes

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Code for paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" EMNLP 2021