DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Overview

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

DETReg

This repository is the implementation of DETReg, see Project Page.

Release

  • COCO training code and eval - DONE
  • Pretrained models - DONE
  • Pascal VOC training code and eval- TODO

Introduction

DETReg is an unsupervised pretraining approach for object DEtection with TRansformers using Region priors. Motivated by the two tasks underlying object detection: localization and categorization, we combine two complementary signals for self-supervision. For an object localization signal, we use pseudo ground truth object bounding boxes from an off-the-shelf unsupervised region proposal method, Selective Search, which does not require training data and can detect objects at a high recall rate and very low precision. The categorization signal comes from an object embedding loss that encourages invariant object representations, from which the object category can be inferred. We show how to combine these two signals to train the Deformable DETR detection architecture from large amounts of unlabeled data. DETReg improves the performance over competitive baselines and previous self-supervised methods on standard benchmarks like MS COCO and PASCAL VOC. DETReg also outperforms previous supervised and unsupervised baseline approaches on low-data regime when trained with only 1%, 2%, 5%, and 10% of the labeled data on MS COCO.

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n detreg python=3.7 pip

    Then, activate the environment:

    conda activate detreg

    Installation: (change cudatoolkit to your cuda version. For detailed pytorch installation instructions click here)

    conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Usage

Dataset preparation

Please download COCO 2017 dataset and ImageNet and organize them as following:

code_root/
└── data/
    ├── ilsvrc/
          ├── train/
          └── val/
    └── MSCoco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Note that in this work we used the ImageNet100 dataset, which is x10 smaller than ImageNet. To create ImageNet100 run the following command:

mkdir -p data/ilsvrc100/train
mkdir -p data/ilsvrc100/val
while read line; do ln -s <code_root>/data/ilsvrc/train/$line <code_root>/data/ilsvrc100/train/$line; done < <code_root>/datasets/category.txt
while read line; do ln -s <code_root>/data/ilsvrc/val/$line <code_root>/data/ilsvrc100/val/$line; done < <code_root>/datasets/category.txt

This should results with the following structure:

code_root/
└── data/
    ├── ilsvrc/
          ├── train/
          └── val/
    ├── ilsvrc100/
          ├── train/
          └── val/
    └── MSCoco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Create ImageNet Selective Search boxes:

Download the precomputed ImageNet boxes and extract in the cache folder:

mkdir -p /cache/ilsvrc && cd /cache/ilsvrc 
wget https://github.com/amirbar/DETReg/releases/download/1.0.0/ss_box_cache.tar.gz
tar -xf ss_box_cache.tar.gz

Alternatively, you can compute Selective Search boxes yourself:

To create selective search boxes for ImageNet100 on a single machine, run the following command (set num_processes):

python -m datasets.cache_ss --dataset imagenet100 --part 0 --num_m 1 --num_p <num_processes_to_use> 

To speed up the creation of boxes, change the arguments accordingly and run the following command on each different machine:

python -m datasets.cache_ss --dataset imagenet100 --part <machine_number> --num_m <num_machines> --num_p <num_processes_to_use> 

The cached boxes are saved in the following structure:

code_root/
└── cache/
    └── ilsvrc/

Training

The command for pretraining DETReg on 8 GPUs on ImageNet100 is as following:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/DETReg_top30_in100.sh --batch_size 24 --num_workers 8

Training takes around 1.5 days with 8 NVIDIA V100 GPUs, you can download a pretrained model (see below) if you want to skip this step.

After pretraining, a checkpoint is saved in exps/DETReg_top30_in100/checkpoint.pth. To fine tune it over different coco settings use the following commands: Fine tuning on full COCO (should take 2 days with 8 NVIDIA V100 GPUs):

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/DETReg_fine_tune_full_coco.sh

For smaller subsets which trains faster, you can use smaller number of gpus (e.g 4 with batch size 2)/ Fine tuning on 1%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_1pct_coco.sh --batch_size 2

Fine tuning on 2%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_2pct_coco.sh --batch_size 2

Fine tuning on 5%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_5pct_coco.sh --batch_size 2

Fine tuning on 10%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_10pct_coco.sh --batch_size 2

Evaluation

To evaluate a finetuned model, use the following command from the project basedir:

./configs/<config file>.sh --resume exps/<config file>/checkpoint.pth --eval

Pretrained Models

Cite

If you found this code helpful, feel free to cite our work:

@misc{bar2021detreg,
      title={DETReg: Unsupervised Pretraining with Region Priors for Object Detection},
      author={Amir Bar and Xin Wang and Vadim Kantorov and Colorado J Reed and Roei Herzig and Gal Chechik and Anna Rohrbach and Trevor Darrell and Amir Globerson},
      year={2021},
      eprint={2106.04550},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Related Works

If you found DETReg useful, consider checking out these related works as well: ReSim, SwAV, DETR, UP-DETR, and Deformable DETR.

Acknowlegments

DETReg builds on previous works code base such as Deformable DETR and UP-DETR. If you found DETReg useful please consider citing these works as well.

Comments
  • Question about reproducing the Semi-supervised Learning experiment

    Question about reproducing the Semi-supervised Learning experiment

    When i using this checkpoint as pretrain

    image

    and using these script to reproducing the Semi-supervised Learning experiment

    image

    the result turns out to be huge difference :

    image

    Please help me, did i missing anything in reproducing ?

    By the way, i can reproduce the full COCO result @45.5AP. So the conda env is probably right.

    opened by 4-0-4-notfound 5
  • Question about selective search cached boxes in training and validation

    Question about selective search cached boxes in training and validation

    Why are there some '.npy' files for the Imagnet validation set in 'ss_box_cache.tar.gz', for example ILSVRC2012_val_00000006.npy. Are training sets and validation sets used for pretraining?

    opened by CQIITLAB 3
  • RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

    RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

    Hi, I'm trying to run the pretraining but I receive a mismatch size here https://github.com/amirbar/DETReg/blob/main/models/deformable_detr.py#L328, src_features has a shape of torch.Size([228, 512]) and target_features a shape of torch.Size([228, 3, 128, 128]). Is this ok?

    Start training /home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) /home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py:329: UserWarning: Using a target size (torch.Size([228, 3, 128, 128])) that is different to the input size (torch.Size([228, 512])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')} Traceback (most recent call last): File "main.py", line 403, in main(args) File "main.py", line 314, in main model, swav_model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm) File "/home/jossalgon/notebooks/unsupervised/DETReg/engine.py", line 50, in train_one_epoch loss_dict = criterion(outputs, targets) File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 406, in forward losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes, **kwargs)) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 381, in get_loss return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 329, in loss_object_embedding_loss return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')} File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py", line 3058, in l1_loss expanded_input, expanded_target = torch.broadcast_tensors(input, target) File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/functional.py", line 73, in broadcast_tensors return _VF.broadcast_tensors(tensors) # type: ignore[attr-defined] RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3 Traceback (most recent call last): File "./tools/launch.py", line 192, in main() File "./tools/launch.py", line 188, in main cmd=process.args)

    Using: cudatoolkit 11.1.74 h6bb024c_0 nvidia/linux-64 pytorch 1.9.0 py3.7_cuda11.1_cudnn8.0.5_0 pytorch/linux-64 torchaudio 0.9.0 py37 pytorch/linux-64 torchvision 0.10.0 py37_cu111 pytorch/linux-64

    Thanks and great work!

    opened by jossalgon 3
  • error occured when following the Compiling CUDA operators step.

    error occured when following the Compiling CUDA operators step.

    Hello, when I try to run sh ./make.sh by following the Compiling CUDA operators it always show the error, Traceback (most recent call last): File "setup.py", line 69, in ext_modules=get_extensions(), File "setup.py", line 47, in get_extensions raise NotImplementedError('Cuda is not availabel') NotImplementedError: Cuda is not availabel

    Any idea why this happens? Thanks!

    opened by ruizhaoz 2
  • Can`t install MultiScaleDeformableAttention

    Can`t install MultiScaleDeformableAttention

    Hi, I have seen others have resolved this, but no clues are left behind. I was not able to install the MultiScaleDeformableAttention package from pip or conda, and there is nothing in Readme.

    Please assist. Thank you!

    opened by jshtok 2
  • Results between IN100 and IN1k setting

    Results between IN100 and IN1k setting

    In the arXiv v1 version, the fine-tune result on COCO is 45.5 with IN100 pretrain. But in the arXiv v2 version, it seems the fine-tune result on COCO is still 45.5, but the pretrain dataset is IN1k. So, in my understanding, with more pretrain data, but the fine-tune result is not improved?

    opened by 4-0-4-notfound 2
  • Fine Tuning the Model on a fraction of VOC

    Fine Tuning the Model on a fraction of VOC

    Hi @amirbar,

    Thank You for the great work. It looks like the parameter --filter_pct has never been used in the code. It means the code effectively running fine-tuning on whole VOC/COCO datasets. Please correct me if I am wrong.

    Thanks

    opened by mmaaz60 2
  • It seems in the pretrain stage the network output 90 categories instead of 2

    It seems in the pretrain stage the network output 90 categories instead of 2

    Hello, It seems the network output 90 categories instead of 2, in the pretrain stage. In the paper, it supposes to output 2 categories (either back gourd or foreground), which is not true in the code. I'm so confused, Am i missing something?

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/configs/DETReg_top30_in100.sh#L8

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/main.py#L120

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/models/deformable_detr.py#L497-L503

    opened by 4-0-4-notfound 2
  • Pretrained model on ImageNet-1K

    Pretrained model on ImageNet-1K

    Hi, Thank you for sharing your great work. I am conducting a study on the features of DETReg, and wanted to explore the performance with the pretrained model trained on the full ImageNet. I was wondering if you could share an ImageNet-1K pretrained model?

    Thank you

    opened by hanoonaR 2
  • Bug: Target[

    Bug: Target["area"] incorrect when using selective_search (and possibly others)

    The selective_search function changes the boxes to xyxy coordinates. boxes[..., 2] = boxes[..., 0] + boxes[..., 2] boxes[..., 3] = boxes[..., 1] + boxes[..., 3]

    In [get_item] (https://github.com/amirbar/DETReg/blob/36ae5844183499f6bc1a6d8922427b0f473e06d9/datasets/selfdet.py#L67)
    we have boxes = selective_search(img, h, w, res_size=128) ... target['boxes'] = torch.tensor(boxes) ... target['area'] = target['boxes'][..., 2] * target['boxes'][..., 3]

    But boxes at this point on in xyxy not cxcywh, So the "area" is incorrect. I do not know if this effects anything down the line, it may not.

    opened by AZaitzeff 1
  • What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

    What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

    https://github.com/amirbar/DETReg/blob/0a258d879d8981b27ab032b83defc6dfcbf07d35/models/backbone.py#L156-L177

    It seems 'head' is the new training setting that uses dim=128 to align features. But dim=512 ('intermediate') is used in the paper. Does it mean that we should change to dim=128 ('head') to achieve better performance of DETReg?

    Thanks.

    opened by Cohesion97 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Fine-tuning based on the DETR architecture code, but the verification indicators are all 0

    Fine-tuning based on the DETR architecture code, but the verification indicators are all 0

    Thanks for your work. I noticed that you open-sourced the detreg of the DETR architecture, and then I tried to use the pre-trained model on the imagenet dataset you provided to fine-tune training for my custom dataset. But I found that all the indicators are still 0 after more than fifty batches of pre-training. I have followed the tips in the related issues of DETR (https://github.com/facebookresearch/detr/issues?page=1&q=zero) , the num_calss was modified. Many people mentioned that DETR requires a large amount of training data, or fine-tuning. But I am currently using fine-tuning, and the number of fine-tuning datasets is about one thousand. But the effect is still very poor, may I ask why. It's normal for me to use deformable-detr architecture. image

    opened by Flyooofly 0
  •   checkpoint_args = torch.load(args.resume, map_location='cpu')['args'] KeyError: 'args'???

    checkpoint_args = torch.load(args.resume, map_location='cpu')['args'] KeyError: 'args'???

    1: checkpoint_args = torch.load(args.resume,map_location='cpu')['args'] KeyError: 'args' I have this kind of error report in the evaluation stage, I don't know how to deal with it, I hope the owner can help me to solve it, thank you very much. 2: Does the test effect of a single GPU appear to be reduced?

    opened by 873552584 0
  • About few-shot object detection

    About few-shot object detection

    I found the result of few-shot object detection is better than others, could you release the few-shot object detection code? or hyperparameters? or how to import novel and base datasets? thanks :)

    opened by YAOSL98 0
  • urllib.error.HTTPError: HTTP Error 403: Forbidden

    urllib.error.HTTPError: HTTP Error 403: Forbidden

    Downloading: "https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar" to C:\Users\pc/.cache\torch\hub\checkpoints\swav_800ep_pretrain.pth.tar urllib.error.HTTPError: HTTP Error 403: Forbidden hope to solve,thanks

    opened by GDzhu01 0
Releases(1.0.0)
A python library to artfully visualize Factorio Blueprints and an interactive web demo for using it.

Factorio Blueprint Visualizer I love the game Factorio and I really like the look of factories after growing for many hours or blueprints after tweaki

Piet Brömmel 124 Jan 07, 2023
This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots Blind2Unblind Citing Blind2Unblind @inproceedings{wang2022blind2unblind, tit

demonsjin 58 Dec 06, 2022
Official implementation for "Symbolic Learning to Optimize: Towards Interpretability and Scalability"

Symbolic Learning to Optimize This is the official implementation for ICLR-2022 paper "Symbolic Learning to Optimize: Towards Interpretability and Sca

VITA 8 Dec 19, 2022
Auto HMM: Automatic Discrete and Continous HMM including Model selection

Auto HMM: Automatic Discrete and Continous HMM including Model selection

Chess_champion 29 Dec 07, 2022
LBK 20 Dec 02, 2022
The fundamental package for scientific computing with Python.

NumPy is the fundamental package needed for scientific computing with Python. Website: https://www.numpy.org Documentation: https://numpy.org/doc Mail

NumPy 22.4k Jan 09, 2023
A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

Alexis David Jacq 172 Dec 12, 2022
Visual dialog agents with pre-trained vision-and-language encoders.

Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation Or READ-UP: Referring Expression Agent Dialog with Unified Pretr

7 Oct 08, 2022
Unsupervised Image Generation with Infinite Generative Adversarial Networks

Unsupervised Image Generation with Infinite Generative Adversarial Networks Here is the implementation of MICGANs using DCGAN architecture on MNIST da

16 Dec 24, 2021
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness Code for Paper "Imbalanced Gradients: A Subtle Cause of Overestimated Adv

Hanxun Huang 11 Nov 30, 2022
An automated algorithm to extract the linear blend skinning (LBS) from a set of example poses

Dem Bones This repository contains an implementation of Smooth Skinning Decomposition with Rigid Bones, an automated algorithm to extract the Linear B

Electronic Arts 684 Dec 26, 2022
A Tensorflow implementation of BicycleGAN.

BicycleGAN implementation in Tensorflow As part of the implementation series of Joseph Lim's group at USC, our motivation is to accelerate (or sometim

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 97 Dec 02, 2022
GAN-based Matrix Factorization for Recommender Systems

GAN-based Matrix Factorization for Recommender Systems This repository contains the datasets' splits, the source code of the experiments and their res

Ervin Dervishaj 9 Nov 06, 2022
PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

This is a PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR), using subpixel convolution to optimize the inference speed of TecoGAN VSR model. Please refer to the offi

789 Jan 04, 2023
Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Project Azua 0. Overview Many modern AI algorithms are known to be data-hungry, whereas human decision-making is much more efficient. The human can re

Microsoft 197 Jan 06, 2023
Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

Easy-To-Hard The official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks". Gett

Avi Schwarzschild 52 Sep 08, 2022
MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images (ISBI 2021, MELBA 2021)

MultiMix This repository contains the implementation of MultiMix. Our publications for this project are listed below: "MultiMix: Sparingly Supervised,

Ayaan Haque 27 Dec 22, 2022
Conjugated Discrete Distributions for Distributional Reinforcement Learning (C2D)

Conjugated Discrete Distributions for Distributional Reinforcement Learning (C2D) Code & Data Appendix for Conjugated Discrete Distributions for Distr

1 Jan 11, 2022
(Personalized) Page-Rank computation using PyTorch

torch-ppr This package allows calculating page-rank and personalized page-rank via power iteration with PyTorch, which also supports calculation on GP

Max Berrendorf 69 Dec 03, 2022
This repository contains the implementation of the paper: Federated Distillation of Natural Language Understanding with Confident Sinkhorns

Federated Distillation of Natural Language Understanding with Confident Sinkhorns This repository provides an alternative method for ensembled distill

Deep Cognition and Language Research (DeCLaRe) Lab 11 Nov 16, 2022