Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

Last update: Dec 28, 2022

Related tags

Overview

Introdunction

This is the official implementation of the paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

Abstract

This paper presents a simple and effective approach to solving the multi-label classification problem. The proposed approach leverages Transformer decoders to query the existence of a class label. The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image. The built-in cross-attention module in the Transformer decoder offers an effective way to use label embeddings as queries to probe and pool class-related features from a feature map computed by a vision backbone for subsequent binary classifications. Compared with prior works, the new framework is simple, using standard Transformers and vision backbones, and effective, consistently outperforming all previous works on five multi-label classification data sets, including MS-COCO, PASCAL VOC, NUS-WIDE, and Visual Genome. Particularly, we establish 91.3% mAP on MS-COCO. We hope its compact structure, simple implementation, and superior performance serve as a strong baseline for multi-label classification tasks and future studies.

Results on MS-COCO:

Quick start

(optional) Star this repo.
Clone this repo:

git clone [email protected]:SlongLiu/query2labels.git
cd query2labels

Install cuda, PyTorch and torchvision.

Please make sure they are compatible. We test our models on two envs and other configs may also work:

cuda==11, torch==1.9.0, torchvision==0.10.0, python==3.7.3
or
cuda==10.2, torch==1.6.0, torchvision==0.7.0, python==3.7.3

Install other needed packages.

pip install -r requirments.txt

Data preparation.

Download MS-COCO 2014 and modify the path in lib/dataset/cocodataset.py: line 24, 25.

Download pretrained models.

You could download pretrained models from this link. See more details below.

Run!

python q2l_infer.py -a modelname --config /path/to/json/file --resume /path/to/pkl/file [other args]
e.g.
python q2l_infer.py -a 'Q2L-R101-448' --config "pretrained/Q2L-R101-448/config_new.json" -b 16 --resume 'pretrained/Q2L-R101-448/checkpoint.pkl'

pretrianed model

Modelname	mAP	link(Tsinghua-cloud)
Q2L-R101-448	84.9	this link
Q2L-R101-576	86.5	this link
Q2L-TResL-448	87.3	this link
Q2L-TResL_22k-448	89.2	this link
Q2L-SwinL-384	90.5	this link
Q2L-CvT_w24-384	91.3	this link

Training

Training scripts will be available later.

BibTex

@misc{liu2021query2label,
      title={Query2Label: A Simple Transformer Way to Multi-Label Classification}, 
      author={Shilong Liu and Lei Zhang and Xiao Yang and Hang Su and Jun Zhu},
      year={2021},
      eprint={2107.10834},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

We thank the authors of ASL, TResNet, detr, CvT, and Swin-Transformer for their great works and codes. Thanks to @mrT23 for sharing training tricks and providing a useful script for training.

Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

Related tags

Overview

Introdunction

Abstract

Results on MS-COCO:

Quick start

pretrianed model

Training

BibTex

Acknowledgement

Owner

Shilong Liu

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

IRON Kaggle project done while doing IRONHACK Bootcamp where we had to analyze and use a Machine Learning Project to predict future sales

Reimplementation of Dynamic Multi-scale filters for Semantic Segmentation.

Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21

To Design and Implement Logistic Regression to Classify Between Benign and Malignant Cancer Types

Intent parsing and slot filling in PyTorch with seq2seq + attention

Optimized primitives for collective multi-GPU communication

Functional TensorFlow Implementation of Singular Value Decomposition for paper Fast Graph Learning

CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Do Neural Networks for Segmentation Understand Insideness?

MT-GAN-PyTorch - PyTorch Implementation of Learning to Transfer: Unsupervised Domain Translation via Meta-Learning

Source code for CVPR 2021 paper "Riggable 3D Face Reconstruction via In-Network Optimization"

Generating Fractals on Starknet with Cairo

Pmapper is a super-resolution and deconvolution toolkit for python 3.6+

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

Repo 4 basic seminar §How to make human machine readable"

A general 3D Object Detection codebase in PyTorch.

Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"