DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Overview

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

DETReg

This repository is the implementation of DETReg, see Project Page.

Release

  • COCO training code and eval - DONE
  • Pretrained models - DONE
  • Pascal VOC training code and eval- TODO

Introduction

DETReg is an unsupervised pretraining approach for object DEtection with TRansformers using Region priors. Motivated by the two tasks underlying object detection: localization and categorization, we combine two complementary signals for self-supervision. For an object localization signal, we use pseudo ground truth object bounding boxes from an off-the-shelf unsupervised region proposal method, Selective Search, which does not require training data and can detect objects at a high recall rate and very low precision. The categorization signal comes from an object embedding loss that encourages invariant object representations, from which the object category can be inferred. We show how to combine these two signals to train the Deformable DETR detection architecture from large amounts of unlabeled data. DETReg improves the performance over competitive baselines and previous self-supervised methods on standard benchmarks like MS COCO and PASCAL VOC. DETReg also outperforms previous supervised and unsupervised baseline approaches on low-data regime when trained with only 1%, 2%, 5%, and 10% of the labeled data on MS COCO.

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n detreg python=3.7 pip

    Then, activate the environment:

    conda activate detreg

    Installation: (change cudatoolkit to your cuda version. For detailed pytorch installation instructions click here)

    conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Usage

Dataset preparation

Please download COCO 2017 dataset and ImageNet and organize them as following:

code_root/
└── data/
    ├── ilsvrc/
          ├── train/
          └── val/
    └── MSCoco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Note that in this work we used the ImageNet100 dataset, which is x10 smaller than ImageNet. To create ImageNet100 run the following command:

mkdir -p data/ilsvrc100/train
mkdir -p data/ilsvrc100/val
while read line; do ln -s <code_root>/data/ilsvrc/train/$line <code_root>/data/ilsvrc100/train/$line; done < <code_root>/datasets/category.txt
while read line; do ln -s <code_root>/data/ilsvrc/val/$line <code_root>/data/ilsvrc100/val/$line; done < <code_root>/datasets/category.txt

This should results with the following structure:

code_root/
└── data/
    ├── ilsvrc/
          ├── train/
          └── val/
    ├── ilsvrc100/
          ├── train/
          └── val/
    └── MSCoco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Create ImageNet Selective Search boxes:

Download the precomputed ImageNet boxes and extract in the cache folder:

mkdir -p /cache/ilsvrc && cd /cache/ilsvrc 
wget https://github.com/amirbar/DETReg/releases/download/1.0.0/ss_box_cache.tar.gz
tar -xf ss_box_cache.tar.gz

Alternatively, you can compute Selective Search boxes yourself:

To create selective search boxes for ImageNet100 on a single machine, run the following command (set num_processes):

python -m datasets.cache_ss --dataset imagenet100 --part 0 --num_m 1 --num_p <num_processes_to_use> 

To speed up the creation of boxes, change the arguments accordingly and run the following command on each different machine:

python -m datasets.cache_ss --dataset imagenet100 --part <machine_number> --num_m <num_machines> --num_p <num_processes_to_use> 

The cached boxes are saved in the following structure:

code_root/
└── cache/
    └── ilsvrc/

Training

The command for pretraining DETReg on 8 GPUs on ImageNet100 is as following:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/DETReg_top30_in100.sh --batch_size 24 --num_workers 8

Training takes around 1.5 days with 8 NVIDIA V100 GPUs, you can download a pretrained model (see below) if you want to skip this step.

After pretraining, a checkpoint is saved in exps/DETReg_top30_in100/checkpoint.pth. To fine tune it over different coco settings use the following commands: Fine tuning on full COCO (should take 2 days with 8 NVIDIA V100 GPUs):

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/DETReg_fine_tune_full_coco.sh

For smaller subsets which trains faster, you can use smaller number of gpus (e.g 4 with batch size 2)/ Fine tuning on 1%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_1pct_coco.sh --batch_size 2

Fine tuning on 2%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_2pct_coco.sh --batch_size 2

Fine tuning on 5%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_5pct_coco.sh --batch_size 2

Fine tuning on 10%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_10pct_coco.sh --batch_size 2

Evaluation

To evaluate a finetuned model, use the following command from the project basedir:

./configs/<config file>.sh --resume exps/<config file>/checkpoint.pth --eval

Pretrained Models

Cite

If you found this code helpful, feel free to cite our work:

@misc{bar2021detreg,
      title={DETReg: Unsupervised Pretraining with Region Priors for Object Detection},
      author={Amir Bar and Xin Wang and Vadim Kantorov and Colorado J Reed and Roei Herzig and Gal Chechik and Anna Rohrbach and Trevor Darrell and Amir Globerson},
      year={2021},
      eprint={2106.04550},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Related Works

If you found DETReg useful, consider checking out these related works as well: ReSim, SwAV, DETR, UP-DETR, and Deformable DETR.

Acknowlegments

DETReg builds on previous works code base such as Deformable DETR and UP-DETR. If you found DETReg useful please consider citing these works as well.

Comments
  • Question about reproducing the Semi-supervised Learning experiment

    Question about reproducing the Semi-supervised Learning experiment

    When i using this checkpoint as pretrain

    image

    and using these script to reproducing the Semi-supervised Learning experiment

    image

    the result turns out to be huge difference :

    image

    Please help me, did i missing anything in reproducing ?

    By the way, i can reproduce the full COCO result @45.5AP. So the conda env is probably right.

    opened by 4-0-4-notfound 5
  • Question about selective search cached boxes in training and validation

    Question about selective search cached boxes in training and validation

    Why are there some '.npy' files for the Imagnet validation set in 'ss_box_cache.tar.gz', for example ILSVRC2012_val_00000006.npy. Are training sets and validation sets used for pretraining?

    opened by CQIITLAB 3
  • RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

    RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

    Hi, I'm trying to run the pretraining but I receive a mismatch size here https://github.com/amirbar/DETReg/blob/main/models/deformable_detr.py#L328, src_features has a shape of torch.Size([228, 512]) and target_features a shape of torch.Size([228, 3, 128, 128]). Is this ok?

    Start training /home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) /home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py:329: UserWarning: Using a target size (torch.Size([228, 3, 128, 128])) that is different to the input size (torch.Size([228, 512])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')} Traceback (most recent call last): File "main.py", line 403, in main(args) File "main.py", line 314, in main model, swav_model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm) File "/home/jossalgon/notebooks/unsupervised/DETReg/engine.py", line 50, in train_one_epoch loss_dict = criterion(outputs, targets) File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 406, in forward losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes, **kwargs)) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 381, in get_loss return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 329, in loss_object_embedding_loss return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')} File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py", line 3058, in l1_loss expanded_input, expanded_target = torch.broadcast_tensors(input, target) File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/functional.py", line 73, in broadcast_tensors return _VF.broadcast_tensors(tensors) # type: ignore[attr-defined] RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3 Traceback (most recent call last): File "./tools/launch.py", line 192, in main() File "./tools/launch.py", line 188, in main cmd=process.args)

    Using: cudatoolkit 11.1.74 h6bb024c_0 nvidia/linux-64 pytorch 1.9.0 py3.7_cuda11.1_cudnn8.0.5_0 pytorch/linux-64 torchaudio 0.9.0 py37 pytorch/linux-64 torchvision 0.10.0 py37_cu111 pytorch/linux-64

    Thanks and great work!

    opened by jossalgon 3
  • error occured when following the Compiling CUDA operators step.

    error occured when following the Compiling CUDA operators step.

    Hello, when I try to run sh ./make.sh by following the Compiling CUDA operators it always show the error, Traceback (most recent call last): File "setup.py", line 69, in ext_modules=get_extensions(), File "setup.py", line 47, in get_extensions raise NotImplementedError('Cuda is not availabel') NotImplementedError: Cuda is not availabel

    Any idea why this happens? Thanks!

    opened by ruizhaoz 2
  • Can`t install MultiScaleDeformableAttention

    Can`t install MultiScaleDeformableAttention

    Hi, I have seen others have resolved this, but no clues are left behind. I was not able to install the MultiScaleDeformableAttention package from pip or conda, and there is nothing in Readme.

    Please assist. Thank you!

    opened by jshtok 2
  • Results between IN100 and IN1k setting

    Results between IN100 and IN1k setting

    In the arXiv v1 version, the fine-tune result on COCO is 45.5 with IN100 pretrain. But in the arXiv v2 version, it seems the fine-tune result on COCO is still 45.5, but the pretrain dataset is IN1k. So, in my understanding, with more pretrain data, but the fine-tune result is not improved?

    opened by 4-0-4-notfound 2
  • Fine Tuning the Model on a fraction of VOC

    Fine Tuning the Model on a fraction of VOC

    Hi @amirbar,

    Thank You for the great work. It looks like the parameter --filter_pct has never been used in the code. It means the code effectively running fine-tuning on whole VOC/COCO datasets. Please correct me if I am wrong.

    Thanks

    opened by mmaaz60 2
  • It seems in the pretrain stage the network output 90 categories instead of 2

    It seems in the pretrain stage the network output 90 categories instead of 2

    Hello, It seems the network output 90 categories instead of 2, in the pretrain stage. In the paper, it supposes to output 2 categories (either back gourd or foreground), which is not true in the code. I'm so confused, Am i missing something?

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/configs/DETReg_top30_in100.sh#L8

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/main.py#L120

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/models/deformable_detr.py#L497-L503

    opened by 4-0-4-notfound 2
  • Pretrained model on ImageNet-1K

    Pretrained model on ImageNet-1K

    Hi, Thank you for sharing your great work. I am conducting a study on the features of DETReg, and wanted to explore the performance with the pretrained model trained on the full ImageNet. I was wondering if you could share an ImageNet-1K pretrained model?

    Thank you

    opened by hanoonaR 2
  • Bug: Target[

    Bug: Target["area"] incorrect when using selective_search (and possibly others)

    The selective_search function changes the boxes to xyxy coordinates. boxes[..., 2] = boxes[..., 0] + boxes[..., 2] boxes[..., 3] = boxes[..., 1] + boxes[..., 3]

    In [get_item] (https://github.com/amirbar/DETReg/blob/36ae5844183499f6bc1a6d8922427b0f473e06d9/datasets/selfdet.py#L67)
    we have boxes = selective_search(img, h, w, res_size=128) ... target['boxes'] = torch.tensor(boxes) ... target['area'] = target['boxes'][..., 2] * target['boxes'][..., 3]

    But boxes at this point on in xyxy not cxcywh, So the "area" is incorrect. I do not know if this effects anything down the line, it may not.

    opened by AZaitzeff 1
  • What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

    What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

    https://github.com/amirbar/DETReg/blob/0a258d879d8981b27ab032b83defc6dfcbf07d35/models/backbone.py#L156-L177

    It seems 'head' is the new training setting that uses dim=128 to align features. But dim=512 ('intermediate') is used in the paper. Does it mean that we should change to dim=128 ('head') to achieve better performance of DETReg?

    Thanks.

    opened by Cohesion97 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Fine-tuning based on the DETR architecture code, but the verification indicators are all 0

    Fine-tuning based on the DETR architecture code, but the verification indicators are all 0

    Thanks for your work. I noticed that you open-sourced the detreg of the DETR architecture, and then I tried to use the pre-trained model on the imagenet dataset you provided to fine-tune training for my custom dataset. But I found that all the indicators are still 0 after more than fifty batches of pre-training. I have followed the tips in the related issues of DETR (https://github.com/facebookresearch/detr/issues?page=1&q=zero) , the num_calss was modified. Many people mentioned that DETR requires a large amount of training data, or fine-tuning. But I am currently using fine-tuning, and the number of fine-tuning datasets is about one thousand. But the effect is still very poor, may I ask why. It's normal for me to use deformable-detr architecture. image

    opened by Flyooofly 0
  •   checkpoint_args = torch.load(args.resume, map_location='cpu')['args'] KeyError: 'args'???

    checkpoint_args = torch.load(args.resume, map_location='cpu')['args'] KeyError: 'args'???

    1: checkpoint_args = torch.load(args.resume,map_location='cpu')['args'] KeyError: 'args' I have this kind of error report in the evaluation stage, I don't know how to deal with it, I hope the owner can help me to solve it, thank you very much. 2: Does the test effect of a single GPU appear to be reduced?

    opened by 873552584 0
  • About few-shot object detection

    About few-shot object detection

    I found the result of few-shot object detection is better than others, could you release the few-shot object detection code? or hyperparameters? or how to import novel and base datasets? thanks :)

    opened by YAOSL98 0
  • urllib.error.HTTPError: HTTP Error 403: Forbidden

    urllib.error.HTTPError: HTTP Error 403: Forbidden

    Downloading: "https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar" to C:\Users\pc/.cache\torch\hub\checkpoints\swav_800ep_pretrain.pth.tar urllib.error.HTTPError: HTTP Error 403: Forbidden hope to solve,thanks

    opened by GDzhu01 0
Releases(1.0.0)
Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

Official implementation for paper "Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR"

Ziyue Feng 72 Dec 09, 2022
Let's create a tool to convert Thailand budget from PDF to CSV.

thailand-budget-pdf2csv Let's create a tool to convert Thailand Government Budgeting from PDF to CSV! รวมพลัง Dev แปลงงบ จาก PDF สู่ Machine-readable

Kao.Geek 88 Dec 19, 2022
Code for our WACV 2022 paper "Hyper-Convolution Networks for Biomedical Image Segmentation"

Hyper-Convolution Networks for Biomedical Image Segmentation Code for our WACV 2022 paper "Hyper-Convolution Networks for Biomedical Image Segmentatio

Tianyu Ma 17 Nov 02, 2022
A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

A PyTorch Implementation of GGNN This is a PyTorch implementation of the Gated Graph Sequence Neural Networks (GGNN) as described in the paper Gated G

Ching-Yao Chuang 427 Dec 13, 2022
This is Official implementation for "Pose-guided Feature Disentangling for Occluded Person Re-Identification Based on Transformer" in AAAI2022

PFD:Pose-guided Feature Disentangling for Occluded Person Re-identification based on Transformer This repo is the official implementation of "Pose-gui

Tao Wang 93 Dec 18, 2022
A criticism of a recent paper on buggy image downsampling methods in popular image processing and deep learning libraries.

A criticism of a recent paper on buggy image downsampling methods in popular image processing and deep learning libraries.

70 Jul 12, 2022
Exploration-Exploitation Dilemma Solving Methods

Exploration-Exploitation Dilemma Solving Methods Medium article for this repo - HERE In ths repo I implemented two techniques for tackling mentioned t

Aman Mishra 6 Jan 25, 2022
Forecasting Nonverbal Social Signals during Dyadic Interactions with Generative Adversarial Neural Networks

ForecastingNonverbalSignals This is the implementation for the paper Forecasting Nonverbal Social Signals during Dyadic Interactions with Generative A

1 Feb 10, 2022
AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis.

AITom Introduction AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis. AITom is originated from the tomominer l

93 Jan 02, 2023
DeiT: Data-efficient Image Transformers

DeiT: Data-efficient Image Transformers This repository contains PyTorch evaluation code, training code and pretrained models for DeiT (Data-Efficient

Facebook Research 3.2k Jan 06, 2023
Fully Convolutional Refined Auto Encoding Generative Adversarial Networks for 3D Multi Object Scenes

Fully Convolutional Refined Auto-Encoding Generative Adversarial Networks for 3D Multi Object Scenes This repository contains the source code for Full

Yu Nishimura 106 Nov 21, 2022
A baseline code for VSPW

A baseline code for VSPW Preparation Download VSPW dataset The VSPW dataset with extracted frames and masks is available here.

28 Aug 22, 2022
This repo is customed for VisDrone.

Object Detection for VisDrone(无人机航拍图像目标检测) My environment 1、Windows10 (Linux available) 2、tensorflow = 1.12.0 3、python3.6 (anaconda) 4、cv2 5、ensemble

53 Jul 17, 2022
Object detection evaluation metrics using Python.

Object detection evaluation metrics using Python.

Louis Facun 2 Sep 06, 2022
Efficient Sparse Attacks on Videos using Reinforcement Learning

EARL This repository provides a simple implementation of the work "Efficient Sparse Attacks on Videos using Reinforcement Learning" Example: Demo: Her

12 Dec 05, 2021
Yolox-bytetrack-sample - Python sample of MOT (Multiple Object Tracking) using YOLOX and ByteTrack

yolox-bytetrack-sample YOLOXとByteTrackを用いたMOT(Multiple Object Tracking)のPythonサン

KazuhitoTakahashi 12 Nov 09, 2022
A computational block to solve entity alignment over textual attributes in a knowledge graph creation pipeline.

How to apply? Create your config.ini file following the example provided in config.ini Choose one of the options below to run: Run with Python3 pip in

Scientific Data Management Group 3 Jun 23, 2022
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

Rakshitha Godahewa 80 Dec 30, 2022
Code to replicate the key results from Exploring the Limits of Out-of-Distribution Detection

Exploring the Limits of Out-of-Distribution Detection In this repository we're collecting replications for the key experiments in the Exploring the Li

Stanislav Fort 35 Jan 03, 2023
Code repository for the paper Computer Vision User Entity Behavior Analytics

Computer Vision User Entity Behavior Analytics Code repository for "Computer Vision User Entity Behavior Analytics" Code Description dataset.csv As di

Sameer Khanna 2 Aug 20, 2022