Unified tracking framework with a single appearance model

Last update: Dec 24, 2022

Related tags

Deep Learning UniTrack

Overview

Paper: Do different tracking tasks require different appearance model?

[ArXiv] (comming soon) [Project Page] (comming soon)

UniTrack is a simple and Unified framework for versatile visual Tracking tasks.

As an important problem in computer vision, tracking has been fragmented into a multitude of different experimental setups. As a consequence, the literature has fragmented too, and now the novel approaches proposed by the community are usually specialized to fit only one specific setup. To understand to what extend this specialization is actually necessary, we present UniTrack, a solution to address multiple different tracking tasks within the same framework. All tasks share the same universal appearance model. UniTrack enjoys the following advantages,

Do NOT need training on a specific tracking task.
Good performance in existing tracking tasks, thus can serve as strong baselines for each task.
Could be easily adapted to novel tasks with different setup.
Could serve as an evaluation platform to test pre-trained representations on tracking tasks (e.g. via self-supervised models).

Tasks & Framework

Tasks

We classify existing tracking tasks along four axes: (1) Single or multiple targets; (2) Users specify targets or automatic detectors specify targets; (3) Observation formats (bounding box/mask/pose); (2) Class-agnostic or class-specific (i.e. human/vehicles). We mainly expriment on 5 tasks: SOT, VOS, MOT, MOTS, and PoseTrack. Task setups are summarized in the above figure.

Appearance model

An appearance model is the only learnable component in UniTrack. It should provide universal visual representation, and is usually pre-trained on large-scale dataset in supervised or unsupervised manners. Typical examples include ImageNet pre-trained ResNets (supervised), and recent self-supervised models such as MoCo and SimCLR (unsupervised).

Propagation and Association

Two fundamental algorithm building blocks in UniTrack. Both employ features extracted by the appearance model as input. For propagation we adopt exiting methods such as cross correlation, DCF, and mask propation. For association we employ a simple algorithm and develop a novel similarity metric to make full use of the appearance model.

Results

Below we show results of UniTrack with a simple ImageNet Pre-trained ResNet-18 as the appearance model. More results (other tasks/datasets, more visualization) can be found in results.md.

Qualitative results

Single Object Tracking (SOT) on OTB-2015

Video Object Segmentation (VOS) on DAVIS-2017 val split

Multiple Object Tracking (MOT) on MOT-16 test set private detector track (Detections from FairMOT)

Multiple Object Tracking and Segmentation (MOTS) on MOTS challenge test set (Detections from COSTA_st)

Pose Tracking on PoseTrack-2018 val split (Detections from LightTrack)

Quantitative results

Single Object Tracking (SOT) on OTB-2015

Method	SiamFC	SiamRPN	SiamRPN++	UDT*	UDT+*	LUDT*	LUDT+*	UniTrack_XCorr*	UniTrack_DCF*
AUC	58.2	63.7	69.6	59.4	63.2	60.2	63.9	55.5	61.8

* indicates non-supervised methods

Video Object Segmentation (VOS) on DAVIS-2017 val split

Method	SiamMask	FeelVOS	STM	Colorization*	TimeCycle*	UVC*	CRW*	VFS*	UniTrack*
J-mean	54.3	63.7	79.2	34.6	40.1	56.7	64.8	66.5	58.4

* indicates non-supervised methods

Multiple Object Tracking (MOT) on MOT-16 test set private detector track

Method	POI	DeepSORT-2	JDE	CTrack	TubeTK	TraDes	CSTrack	FairMOT*	UniTrack*
IDF-1	65.1	62.2	55.8	57.2	62.2	64.7	71.8	72.8	71.8
IDs	805	781	1544	1897	1236	1144	1071	1074	683
MOTA	66.1	61.4	64.4	67.6	66.9	70.1	70.7	74.9	74.7

* indicates methods using the same detections

Multiple Object Tracking and Segmentation (MOTS) on MOTS challenge test set

Method	TrackRCNN	SORTS	PointTrack	GMPHD	COSTA_st*	UniTrack*
IDF-1	42.7	57.3	42.9	65.6	70.3	67.2
IDs	567	577	868	566	421	622
sMOTA	40.6	55.0	62.3	69.0	70.2	68.9

* indicates methods using the same detections

Pose Tracking on PoseTrack-2018 val split

Method	MDPN	OpenSVAI	Miracle	KeyTrack	LightTrack*	UniTrack*
IDF-1	-	-	-	-	52.2	73.2
IDs	-	-	-	-	3024	6760
sMOTA	50.6	62.4	64.0	66.6	64.8	63.5

* indicates methods using the same detections

Getting started

Demo

Update log

[2021.6.24]: Start writing docs, please stay tuned!

Acknowledgement

VideoWalk by Allan A. Jabri

SOT code by Zhipeng Zhang

Unified tracking framework with a single appearance model

Related tags

Overview

Tasks & Framework

Tasks

Appearance model

Propagation and Association

Results

Qualitative results

Quantitative results

Getting started

Demo

Update log

Acknowledgement

Owner

ZhongdaoWang

List of papers, code and experiments using deep learning for time series forecasting

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

Command-line tool for downloading and extending the RedCaps dataset.

PyTorch implementation of Tacotron speech synthesis model.

This repository provides a PyTorch implementation and model weights for HCSC (Hierarchical Contrastive Selective Coding)

Code for ViTAS_Vision Transformer Architecture Search

Styled text-to-drawing synthesis method. Featured at the 2021 NeurIPS Workshop on Machine Learning for Creativity and Design

RLBot Python bindings for the Rust crate rl_ball_sym

Implementation of neural class expression synthesizers

TriMap: Large-scale Dimensionality Reduction Using Triplets

An onlinel learning to rank python codebase.

TensorFlow Metal Backend on Apple Silicon Experiments (just for fun)

A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

Official implementation of Deep Burst Super-Resolution

Neon-erc20-example - Example of creating SPL token and wrapping it with ERC20 interface in Neon EVM

Cossim - Sharpened Cosine Distance implementation in PyTorch

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)