General Multi-label Image Classification with Transformers

Last update: Dec 21, 2022

Overview

General Multi-label Image Classification with Transformers
Jack Lanchantin, Tianlu Wang, Vicente Ordóñez Román, Yanjun Qi
Conference on Computer Vision and Pattern Recognition (CVPR) 2021
[paper] [poster] [slides]

Training and Running C-Tran

Python version 3.7 is required and all major packages used and their versions are listed in requirements.txt.

C-Tran on COCO80 Dataset

Download COCO data (19G)

wget http://cs.virginia.edu/~jjl5sw/data/vision/coco.tar.gz
mkdir -p data/
tar -xvf coco.tar.gz -C data/

Train New Model

python main.py  --batch_size 16  --lr 0.00001 --optim 'adam' --layers 3  --dataset 'coco' --use_lmt --dataroot data/

C-Tran on VOC20 Dataset

Download VOC2007 data (1.7G)

wget http://cs.virginia.edu/~jjl5sw/data/vision/voc.tar.gz
mkdir -p data/
tar -xvf voc.tar.gz -C data/

Train New Model

python main.py  --batch_size 16  --lr 0.00001 --optim 'adam' --layers 3  --dataset 'voc' --use_lmt --grad_ac_step 2 --dataroot data/

Citing

@article{lanchantin2020general,
  title={General Multi-label Image Classification with Transformers},
  author={Lanchantin, Jack and Wang, Tianlu and Ordonez, Vicente and Qi, Yanjun},
  journal={arXiv preprint arXiv:2011.14027},
  year={2020}
}

General Multi-label Image Classification with Transformers

Related tags

Overview

Training and Running C-Tran

C-Tran on COCO80 Dataset

C-Tran on VOC20 Dataset

Citing

Owner

QData

A toolkit for Lagrangian-based constrained optimization in Pytorch

Save-restricted-v-3 - Save restricted content Bot For telegram

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Convolutional Neural Network for 3D meshes in PyTorch

(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Automatic meme generation model using Tensorflow Keras.

This repository implements variational graph auto encoder by Thomas Kipf.

Code of Puregaze: Purifying gaze feature for generalizable gaze estimation, AAAI 2022.

NIMA: Neural IMage Assessment

Semi-supervised Stance Detection of Tweets Via Distant Network Supervision

TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

Mesh TensorFlow: Model Parallelism Made Easier

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

TensorFlow ROCm port

A novel Engagement Detection with Multi-Task Training (ED-MTT) system

Binary classification for arrythmia detection with ECG datasets.

A curated list and survey of awesome Vision Transformers.