Object detection on multiple datasets with an automatically learned unified label space.

Overview

Simple multi-dataset detection

An object detector trained on multiple large-scale datasets with a unified label space; Winning solution of ECCV 2020 Robust Vision Challenges.

Simple multi-dataset detection,
Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl,
arXiv technical report (arXiv 2102.13086)

Contact: [email protected]. Any questions or discussions are welcomed!

Abstract

How do we build a general and broad object detection system? We use all labels of all concepts ever annotated. These labels span diverse datasets with potentially inconsistent taxonomies. In this paper, we present a simple method for training a unified detector on multiple large-scale datasets. We use dataset-specific training protocols and losses, but share a common detection architecture with dataset-specific outputs. We show how to automatically integrate these dataset-specific outputs into a common semantic taxonomy. In contrast to prior work, our approach does not require manual taxonomy reconciliation. Our multi-dataset detector performs as well as dataset-specific models on each training domain, but generalizes much better to new unseen domains. Entries based on the presented methodology ranked first in the object detection and instance segmentation tracks of the ECCV 2020 Robust Vision Challenge.

Features at a glance

  • We trained a unified object detector on 4 large-scale detection datasets: COCO, Objects365, OpenImages, and Mapillary, with state-of-the-art performance on all of them.

  • The model predicts class labels in a learned unified label space.

  • The model can be directly used to test on novel datasets outside the training datasets.

  • In this repo, we also provide state-of-the-art baselines for Objects365 and OpenImages.

Main results

COCO test-challenge OpenImages public test Mapillary test Objects365 val
52.9 60.6 25.3 33.7

Results are obtained using a Cascade-RCNN with ResNeSt200 trained in an 8x schedule.

  • Unified model vs. ensemble of dataset-specific models with known test domains.
COCO Objects365 OpenImages mean.
Unified 45.4 24.4 66.0 45.3
Dataset-specific models 42.5 24.9 65.7 44.4

Results are obtained using a Cascade-RCNN with Res50 trained in an 8x schedule.

  • Zero-shot cross dataset evaluation
VOC VIPER CityScapes ScanNet WildDash CrowdHuman KITTI mean
Unified 82.9 21.3 52.6 29.8 34.7 70.7 39.9 47.3
Oracle models 80.3 31.8 54.6 44.7 - 80.0 - -

Results are obtained using a Cascade-RCNN with Res50 trained in an 8x schedule.

More models can be found in our MODEL ZOO.

Installation

Our project is developed on detectron2. Please follow the official detectron2 installation. All our code is under projects/UniDet/. In theory, you should be able to copy-paste projects/UniDet/ to the latest detectron2 release or your own detectron2 repo to run our project. There might be API changes in future detectron2 releases that make it incompatible.

Demo

We use the same inference API as detectorn2. To run inference on an image folder using our pretrained model, run

python projects/UniDet/demo/demo.py --config-file projects/UniDet/configs/Unified_learned_OCIM_R50_6x+2x.yaml --input images/*.jpg --opts MODEL.WEIGHTS models/Unified_learned_OCIM_R50_6x+2x.pth

If setup correctly, the output should look like:

*The sample image is from WildDash dataset.

Note that the model predicts all labels in its label hierarchy tree (for example, both vehicle and car for a car), following the protocol in OpenImages.

Benchmark evaluation and training

After installation, follow the instructions in DATASETS.md to setup the (many) datasets. Then check REPRODUCE.md to reproduce the results in the paper.

License

All our code under projects/Unidet/ is under Apache 2.0 license. The code from detectron2 follows the original Apache 2.0 license.

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{zhou2021simple,
  title={Simple multi-dataset detection},
  author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={arXiv preprint arXiv:2102.13086},
  year={2021}
}
Owner
Xingyi Zhou
CS Ph.D. student at UT Austin.
Xingyi Zhou
This is an unofficial PyTorch implementation of Meta Pseudo Labels

This is an unofficial PyTorch implementation of Meta Pseudo Labels. The official Tensorflow implementation is here.

Jungdae Kim 320 Jan 08, 2023
Deployment of PyTorch chatbot with Flask

Chatbot Deployment with Flask and JavaScript In this tutorial we deploy the chatbot I created in this tutorial with Flask and JavaScript. This gives 2

Patrick Loeber (Python Engineer) 107 Dec 29, 2022
Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

Phil Wang 92 Dec 25, 2022
Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

IMDB Success Predictor Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine

Gautam Diwan 1 Jan 18, 2022
Some bravo or inspiring research works on the topic of curriculum learning.

Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN Official code for NeurIPS 2021 paper "Towards Scalable Unpaired Virtu

131 Jan 07, 2023
TriMap: Large-scale Dimensionality Reduction Using Triplets

TriMap TriMap is a dimensionality reduction method that uses triplet constraints to form a low-dimensional embedding of a set of points. The triplet c

Ehsan Amid 235 Dec 24, 2022
Research on controller area network Intrusion Detection Systems

Group members information Member 1: Lixue Liang Member 2: Yuet Lee Chan Member 3: Xinruo Zhang Member 4: Yifei Han User Manual Generate Attack Packets

Roche 4 Aug 30, 2022
Code for "Localization with Sampling-Argmax", NeurIPS 2021

Localization with Sampling-Argmax [Paper] [arXiv] [Project Page] Localization with Sampling-Argmax Jiefeng Li, Tong Chen, Ruiqi Shi, Yujing Lou, Yong-

JeffLi 71 Dec 17, 2022
Official Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

TDEER 🦌 🦒 Official Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021) Overview TDEE

33 Dec 23, 2022
Code for "Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks", CVPR 2021

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks This repository contains the code that accompanies our CVPR 20

Despoina Paschalidou 161 Dec 20, 2022
Language Models for the legal domain in Spanish done @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish legal domain Language Model ⚖️ This repository contains the page for two main resources for the Spanish legal domain: A RoBERTa model: https:/

Plan de Tecnologías del Lenguaje - Gobierno de España 12 Nov 14, 2022
A hybrid SOTA solution of LiDAR panoptic segmentation with C++ implementations of point cloud clustering algorithms. ICCV21, Workshop on Traditional Computer Vision in the Age of Deep Learning

ICCVW21-TradiCV-Survey-of-LiDAR-Cluster Motivation In contrast to popular end-to-end deep learning LiDAR panoptic segmentation solutions, we propose a

YimingZhao 103 Nov 22, 2022
Code for paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" EMNLP 2021

The repo provides the code for paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" EMNLP 2

Yuning Mao 18 May 24, 2022
YOLOv5 in PyTorch > ONNX > CoreML > TFLite

This repository represents Ultralytics open-source research into future object detection methods, and incorporates lessons learned and best practices evolved over thousands of hours of training and e

Ultralytics 34.1k Dec 31, 2022
Python scripts form performing stereo depth estimation using the HITNET model in ONNX.

ONNX-HITNET-Stereo-Depth-estimation Python scripts form performing stereo depth estimation using the HITNET model in ONNX. Stereo depth estimation on

Ibai Gorordo 30 Nov 08, 2022
SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers This repo contains our codes for the paper "No Parameters Left Behind: Sensitivity Gu

Chen Liang 23 Nov 07, 2022
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

MetaICL: Learning to Learn In Context This includes an original implementation of "MetaICL: Learning to Learn In Context" by Sewon Min, Mike Lewis, Lu

Meta Research 141 Jan 07, 2023
Bayesian regularization for functional graphical models.

BayesFGM Paper: Jiajing Niu, Andrew Brown. Bayesian regularization for functional graphical models. Requirements R version 3.6.3 and up Python 3.6 and

0 Oct 07, 2021
Oscar and VinVL

Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks VinVL: Revisiting Visual Representations in Vision-Language Models Updates

Microsoft 938 Dec 26, 2022