(CVPR 2022) Pytorch implementation of "Self-supervised transformers for unsupervised object discovery using normalized cut"

Overview

(CVPR 2022) TokenCut

Pytorch implementation of Tokencut:

Self-supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Yangtao Wang, Xi Shen, Shell Xu Hu, Yuan Yuan, James L. Crowley, Dominique Vaufreydaz

[Project page] [Paper] Colab demo Hugging Face Spaces

TokenCut teaser

If our project is helpful for your research, please consider citing :

@inproceedings{wang2022tokencut,
          title={Self-supervised Transformers for Unsupervised Object Discovery using Normalized Cut},
          author={Wang, Yangtao and Shen, Xi and Hu, Shell Xu and Yuan, Yuan and Crowley, James L. and Vaufreydaz, Dominique},
          booktitle={Conference on Computer Vision and Pattern Recognition}
          year={2022}
        }

Table of Content

1. Updates

03/10/2022 Creating a 480p Demo using Gradio. Try out the Web Demo: Hugging Face Spaces

Internet image results:

TokenCut visualizations TokenCut visualizations TokenCut visualizations TokenCut visualizations

02/26/2022 Integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo: Hugging Face Spaces

02/26/2022 A simple TokenCut Colab Demo is available.

02/21/2022 Initial commit: Code of TokenCut is released, including evaluation of unsupervised object discovery, unsupervised saliency object detection, weakly supervised object locolization.

2. Installation

2.1 Dependencies

This code was implemented with Python 3.7, PyTorch 1.7.1 and CUDA 11.2. Please refer to the official installation. If CUDA 10.2 has been properly installed :

pip install torch==1.7.1 torchvision==0.8.2

In order to install the additionnal dependencies, please launch the following command:

pip install -r requirements.txt

2.2 Data

We provide quick download commands in DOWNLOAD_DATA.md for VOC2007, VOC2012, COCO, CUB, ImageNet, ECSSD, DUTS and DUT-OMRON as well as DINO checkpoints.

3. Quick Start

3.1 Detecting an object in one image

We provide TokenCut visualization for bounding box prediction and attention map. Using all for all visualization results.

python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize pred
python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize attn
python main_tokencut.py --image_path examples/VOC07_000036.jpg --visualize all 

3.2 Segmenting a salient region in one image

We provide TokenCut segmentation results as follows:

cd unsupervised_saliency_detection 
python get_saliency.py --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --vit-arch small --patch-size 16 --img-path ../examples/VOC07_000036.jpg --out-dir ./output

4. Evaluation

Following are the different steps to reproduce the results of TokenCut presented in the paper.

4.1 Unsupervised object discovery

TokenCut visualizations TokenCut visualizations TokenCut visualizations

PASCAL-VOC

In order to apply TokenCut and compute corloc results (VOC07 68.8, VOC12 72.1), please launch:

python main_tokencut.py --dataset VOC07 --set trainval
python main_tokencut.py --dataset VOC12 --set trainval

If you want to extract Dino features, which corresponds to the KEY features in DINO:

mkdir features
python main_lost.py --dataset VOC07 --set trainval --save-feat-dir features/VOC2007

COCO

Results are provided given the 2014 annotations following previous works. The following command line allows you to get results on the subset of 20k images of the COCO dataset (corloc 58.8), following previous litterature. To be noted that the 20k images are a subset of the train set.

python main_tokencut.py --dataset COCO20k --set train

Different models

We have tested the method on different setups of the VIT model, corloc results are presented in the following table (more can be found in the paper).

arch pre-training dataset
VOC07 VOC12 COCO20k
ViT-S/16 DINO 68.8 72.1 58.8
ViT-S/8 DINO 67.3 71.6 60.7
ViT-B/16 DINO 68.8 72.4 59.0

Previous results on the dataset VOC07 can be obtained by launching:

python main_tokencut.py --dataset VOC07 --set trainval #VIT-S/16
python main_tokencut.py --dataset VOC07 --set trainval --patch_size 8 #VIT-S/8
python main_tokencut.py --dataset VOC07 --set trainval --arch vit_base #VIT-B/16

4.2 Unsupervised saliency detection

TokenCut visualizations TokenCut visualizations TokenCut visualizations

To evaluate on ECSSD, DUTS, DUT_OMRON dataset:

python get_saliency.py --out-dir ECSSD --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --nb-vis 1 --vit-arch small --patch-size 16 --dataset ECSSD

python get_saliency.py --out-dir DUTS --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --nb-vis 1 --vit-arch small --patch-size 16 --dataset DUTS

python get_saliency.py --out-dir DUT --sigma-spatial 16 --sigma-luma 16 --sigma-chroma 8 --nb-vis 1 --vit-arch small --patch-size 16 --dataset DUT

This should give:

Method ECSSD DUTS DUT-OMRON
maxF IoU Acc maxF IoU Acc maxF IoU Acc
TokenCut 80.3 71.2 91.8 67.2 57.6 90.3 60.0 53.3 88.0
TokenCut + BS 87.4 77.2 93.4 75.5 62,4 91.4 69.7 61.8 89.7

4.3 Weakly supervised object detection

TokenCut visualizations TokenCut visualizations TokenCut visualizations

Fintune DINO small on CUB

To finetune ViT-S/16 on CUB on a single node with 4 gpus for 1000 epochs run:

python -m torch.distributed.launch --nproc_per_node=4 main.py --data_path /path/to/data --batch_size_per_gpu 256 --dataset cub --weight_decay 0.005 --pretrained_weights ./dino_deitsmall16_pretrain.pth --epoch 1000 --output_dir ./path/to/checkpoin --lr 2e-4 --warmup-epochs 50 --num_labels 200 --num_workers 16 --n_last_blocks 1 --avgpool_patchtokens true --arch vit_small --patch_size 16

Evaluation on CUB

To evaluate a fine-tuned ViT-S/16 on CUB val with a single GPU run:

python eval.py --pretrained_weights ./path/to/checkpoint --dataset cub --data_path ./path/to/data --batch_size_per_gpu 1 --no_center_crop

This should give:

Top1_cls: 79.12, top5_cls94.80, gt_loc: 0.914, top1_loc:0.723

Evaluate on Imagenet

To Evaluate ViT-S/16 finetuned on ImageNet val with a single GPU run:

python eval.py --pretrained_weights /path/to/checkpoint --classifier_weights /path/to/linear_weights--dataset imagenet --data_path ./path/to/data --batch_size_per_gpu 1 --num_labels 1000 --batch_size_per_gpu 1 --no_center_crop --input_size 256 --tau 0.2 --patch_size 16 --arch vit_small

5. Acknowledgement

TokenCut code is built on top of LOST, DINO, Segswap, and Bilateral_Sovlver. We would like to sincerely thanks those authors for their great works.

Owner
YANGTAO WANG
PhD, Computer Vision, Deep Learning
YANGTAO WANG
Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

instant-nerf-pytorch This is WORK IN PROGRESS, please feel free to contribute vi

94 Nov 22, 2022
The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

Simple-DMA a simple Dual Memory Architecture for classifications. based on the paper Dual-Memory Deep Learning Architectures for Lifelong Learning of

1 Jan 27, 2022
ByteTrack with ReID module following the paradigm of FairMOT, tracking strategy is borrowed from FairMOT/JDE.

ByteTrack_ReID ByteTrack is the SOTA tracker in MOT benchmarks with strong detector YOLOX and a simple association strategy only based on motion infor

Han GuangXin 46 Dec 29, 2022
unet-family: Ultimate version

unet-family: Ultimate version 基于之前my-unet代码,我整理出来了这一份终极版本unet-family,方便其他人阅读。 相比于之前的my-unet代码,代码分类更加规范,有条理 对于clone下来的代码不需要修改各种复杂繁琐的路径问题,直接就可以运行。 并且代码有

2 Sep 19, 2022
The toolkit to generate auto labeled datasets

Ozeu Ozeu is the toolkit to autolabal dataset for instance segmentation. You can generate datasets labaled with segmentation mask and bounding box fro

Xiong Jie 28 Mar 28, 2022
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

P-Lambda 437 Dec 30, 2022
《Improving Unsupervised Image Clustering With Robust Learning》(2020)

Improving Unsupervised Image Clustering With Robust Learning This repo is the PyTorch codes for "Improving Unsupervised Image Clustering With Robust L

Sungwon Park 129 Dec 27, 2022
A Novel Incremental Learning Driven Instance Segmentation Framework to Recognize Highly Cluttered Instances of the Contraband Items

A Novel Incremental Learning Driven Instance Segmentation Framework to Recognize Highly Cluttered Instances of the Contraband Items This repository co

Taimur Hassan 3 Mar 16, 2022
State of the Art Neural Networks for Generative Deep Learning

pyradox-generative State of the Art Neural Networks for Generative Deep Learning Table of Contents pyradox-generative Table of Contents Installation U

Ritvik Rastogi 8 Sep 29, 2022
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data [WIP] Unofficial Pytorch implementation of AdaSpeech 2. Requirements : All code written i

Rishikesh (ऋषिकेश) 63 Dec 28, 2022
Distributing reference energies for SMIRNOFF implementations

Warning: This code is currently experimental and under active development. Is it not yet suitable for distribution or use as reference implementation.

Open Force Field Initiative 1 Dec 07, 2021
Code-free deep segmentation for computational pathology

NoCodeSeg: Deep segmentation made easy! This is the official repository for the manuscript "Code-free development and deployment of deep segmentation

André Pedersen 26 Nov 23, 2022
Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Non-Metric Space Library (NMSLIB) Important Notes NMSLIB is generic but fast, see the results of ANN benchmarks. A standalone implementation of our fa

2.9k Jan 04, 2023
Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

CSRA This is the official code of ICCV 2021 paper: Residual Attention: A Simple But Effective Method for Multi-Label Recoginition Demo, Train and Vali

163 Dec 22, 2022
DeepCAD: A Deep Generative Network for Computer-Aided Design Models

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
This repository implements WGAN_GP.

Image_WGAN_GP This repository implements WGAN_GP. Image_WGAN_GP This repository uses wgan to generate mnist and fashionmnist pictures. Firstly, you ca

Lieon 6 Dec 10, 2021
Libraries, tools and tasks created and used at DeepMind Robotics.

Libraries, tools and tasks created and used at DeepMind Robotics.

DeepMind 270 Nov 30, 2022
PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

PyTorch Realtime Multi-Person Pose Estimation This is a pytorch version of Realtime_Multi-Person_Pose_Estimation, origin code is here Realtime_Multi-P

Dave Fang 157 Nov 12, 2022
Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PICASO Official PyTorch implemetation for the paper PICASO:Permutation-Invariant Cascaded Attentive Set Operator. Requirements Python 3 torch = 1.0 n

Samira Zare 0 Dec 23, 2021
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 364 Dec 28, 2022