[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Last update: Dec 13, 2022

Related tags

Overview

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper]

@inproceedings{hou2021multiview,
  title={Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)},
  author={Hou, Yunzhong and Zheng, Liang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia (MM ’21)},
  year={2021}
}

Overview

We release the PyTorch code for MVDeTr, a state-of-the-art multiview pedestrian detector. Its superior performance should be credited to transformer architectures, updated loss terms, and view-coherent data augmentations. Moreover, MVDeTr is also very efficient and can be trained on a single RTX 2080TI. This repo also includes a simplified version of MVDet, which also runs on a single RTX 2080TI.

MVDeTr Code

This repo is dedicated to the code for MVDeTr.

Dependencies

This code uses the following libraries

python
pytorch & tochvision
numpy
matplotlib
pillow
opencv-python
kornia

Data Preparation

By default, all datasets are in ~/Data/. We use MultiviewX and Wildtrack in this project.

Your ~/Data/ folder should look like this

Data
├── MultiviewX/
│   └── ...
└── Wildtrack/ 
    └── ...

Code Preparation

Before running the code, one should go to multiview_detector/models/ops and run bash mask.sh to build the deformable transformer (forked from Deformable DETR).

Training

In order to train classifiers, please run the following,

python main.py -d wildtrack
python main.py -d multiviewx

This should automatically return evaluation results similar to the reported 91.5% MODA on Wildtrack dataset and 93.7% MODA on MultiviewX dataset.

Architectures

This repo supports multiple architecture variants. For MVDeTr, please specify --world_feat deform_trans; for a similar fully convolutional architecture like MVDet, please specify --world_feat conv.

Loss terms

This repo supports multiple loss terms. For the focal loss variant as in MVDeTr, please specify --use_mse 0; for the MSE loss as in MVDet, please specify ----use_mse 1.

Augmentations

This repo includes support for view coherent data augmentation, which applies affine transformations onto the per-view inputs, and then invert the per-view feature maps to maintain multiview coherency.

Pre-trained models

You can download the checkpoints at this link.

[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Related tags

Overview

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper]

Overview

Content

MVDeTr Code

Dependencies

Data Preparation

Code Preparation

Training

Architectures

Loss terms

Augmentations

Pre-trained models

Owner

Yunzhong Hou

GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Experiment about Deep Person Re-identification with EfficientNet-v2

Denoising Normalizing Flow

A collection of models for image<->text generation in ACM MM 2021.

Learning What and Where to Draw

An easy-to-use app to visualise attentions of various VQA models.

Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure.

Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

Drone detection using YOLOv5

Multi-Scale Progressive Fusion Network for Single Image Deraining

Car Price Predictor App used to predict the price of the car based on certain input parameters created using python's scikit-learn, fastapi, numpy and joblib packages.

Collapse by Conditioning: Training Class-conditional GANs with Limited Data

Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

DeepOBS: A Deep Learning Optimizer Benchmark Suite

Code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation

Regulatory Instruments for Fair Personalized Pricing.

Alphabetical Letter Recognition

A Python library for working with arbitrary-dimension hypercomplex numbers following the Cayley-Dickson construction of algebras.