Learning Tracking Representations via Dual-Branch Fully Transformer Networks

DualTFR

⭐ We achieves the runner-ups for both VOT2021ST (short-term) and RT(real-time). The variants of DualTFR take 3rd/4th places of VOT2020RT and 4th places of VOT2020ST

For VOT21 challenge model weight download:

We provide the models of Five trackers SAMN, SAMN_DiMP, DualTFR, DualTFRst, DualTFRon here.

Note that the AlphaRefine (https://github.com/MasterBin-IIAU/AlphaRefine) model and SuperDiMP (https://github.com/visionml/pytracking) model are the same with the original author.

Tracker	model quantity	model name
SAMN	1	SAMN.tar
SAMN_DiMP	2	super_dimp.pth.tar, SAMN.tar
DualTFR	2	DualTFR.tar, ar.pth.tar
DualTFRst	2	DualTFRst.tar, ar.pth.tar
DualTFRon	2	DualTFRon.tar, ar.pth.tar

Models can be downloaded from BaiduNetDisk or GoogleDrive:

BaiduNetDisk:

https://pan.baidu.com/s/1RHA7HVlXtNEzYPGIjJbQ-g (sruh)

GoogleDrive:

https://drive.google.com/drive/folders/1Z61_mfh2vwzqDxejt5idBOgYhWOCZOr5?usp=sharing

Code will be released soon.

We present a simple Siamese-like Dual-branch network based on solely Transformer networks to learn about tracking features. Given a template and a search image, we divide them into non-overlapping image patches and extract a feature vector for each based on its matching results with others within an attention window. Then for each token, we estimate whether it contains the target object and the corresponding size. The prominent advantage of the approach is that the features are learned from matching, and ultimately, for matching. So the features are aligned with the subsequent object tracking task. The method achieves comparable results comparing to the best-performing methods which first use CNN to extract features and then use Transformer to fuse them. Without bells and whistles, it outperforms the state-of-the-art methods on GOT-10k and VOT2020 benchmarks. In addition, the method achieves real-time inference speed (about 40 fps).

Acknowledgments

Thanks for the great PyTracking Library, which helps us to quickly implement our ideas.
We use the implementation of the Swin Transformer from the official repo https://github.com/microsoft/Swin-Transformer.

Contacts

Fei Xie, School of Automation, Southeast University, China, [email protected], wechat: 372998044

Learning Tracking Representations via Dual-Branch Fully Transformer Networks

Related tags

Overview

Learning Tracking Representations via Dual-Branch Fully Transformer Networks

DualTFR

For VOT21 challenge model weight download:

Code will be released soon.

Acknowledgments

Contacts

Owner

phiphi

A Demo server serving Bert through ONNX with GPU written in Rust with <3

Accompanying code for the paper "A Kernel Test for Causal Association via Noise Contrastive Backdoor Adjustment".

Automatic Data-Regularized Actor-Critic (Auto-DrAC)

Generate fine-tuning samples & Fine-tuning the model & Generate samples by transferring Note On

Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.

Fairness Metrics: All you need to know

This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

An example of semantic segmentation using tensorflow in eager execution.

PyTorch implementation for Convolutional Networks with Adaptive Inference Graphs

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

An API-first distributed deployment system of deep learning models using timeseries data to analyze and predict systems behaviour

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Distance correlation and related E-statistics in Python

IhoneyBakFileScan Modify - 批量网站备份文件扫描器，增加文件规则，优化内存占用

Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"

This repository contains demos I made with the Transformers library by HuggingFace.

An automated facial recognition based attendance system (desktop application)

Official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

Pytorch implementation of Value Iteration Networks (NIPS 2016 best paper)

AlphaBot2 Pi Core software for interfacing with the various components.