[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Last update: Oct 16, 2022

Overview

DSM

The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Project Website;

Datasets list and some visualizations/provided weights are preparing now.

1. Introduction (scene-dominated to motion-dominated)

Video datasets are usually scene-dominated, We propose to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid.

The generated triplet is as below:

What DSM learned?

With DSM pretrain, the model learn to focus on motion region (Not necessarily actor) powerful without one label available.

2. Installation

Dataset

Please refer dataset.md for details.

Requirements

Python3
pytorch1.1+
PIL
Intel (on the fly decode)

3. Structure

datasets
- list
  - hmdb51: the train/val lists of HMDB51
  - ucf101: the train/val lists of UCF101
  - kinetics-400: the train/val lists of kinetics-400
  - diving48: the train/val lists of diving48
experiments
- logs: experiments record in detials
- gradientes: grad check
- visualization:
src
- data: load data
- loss: the loss evaluate in this paper
- model: network architectures
- scripts: train/eval scripts
- augment: detail implementation of Spatio-temporal Augmentation
- utils
- feature_extract.py: feature extractor given pretrained model
- main.py: the main function of finetune
- trainer.py
- option.py
- pt.py: self-supervised pretrain
- ft.py: supervised finetune

DSM(Triplet)/DSM/Random

Self-supervised Pretrain

Kinetics

bash scripts/kinetics/pt.sh

UCF101

bash scripts/ucf101/pt.sh

Supervised Finetune (Clip-level)

HMDB51

bash scripts/hmdb51/ft.sh

UCF101

bash scripts/ucf101/ft.sh

Kinetics

bash scripts/kinetics/ft.sh

Video-level Evaluation

Following common practice TSN and Non-local. The final video-level result is average by 10 temporal window sampling + corner crop, which lead to better result than clip-level. Refer test.py for details.

Pretrain And Eval In one step

bash scripts/hmdb51/pt_and_ft_hmdb51.sh

Notice: More Training Options and ablation study Can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

UCF101 Pretrained (I3D)

Method	UCF101	HMDB51
Random Initialization	47.9	29.6
MoCo Baseline	62.3	36.5
DSM(Triplet)	70.7	48.5
DSM	74.8	52.5

Kinetics Pretrained

Video Retrieve (UCF101-C3D)

Method	@1	@5	@10	@20	@50
DSM	16.8	33.4	43.4	54.6	70.7

Video Retrieve (HMDB51-C3D)

Method	@1	@5	@10	@20	@50
DSM	8.2	25.9	38.1	52.0	75.0

More Visualization

Acknowledgement

This work is partly based on STN, UEL and MoCo.

License

Citation

If you use our code in your research or wish to refer to the baseline results, pleasuse use the followint BibTex entry.

@inproceedings{wang2020enhancing,
  author    = {Lin, Ji and Zhang, Richard and Ganz, Frieder and Han, Song and Zhu, Jun-Yan},
  title     = {Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion},
  booktitle = {AAAI},
  year      = {2021},
}

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Related tags

Overview

DSM

1. Introduction (scene-dominated to motion-dominated)

What DSM learned?

2. Installation

Dataset

Requirements

3. Structure

DSM(Triplet)/DSM/Random

Self-supervised Pretrain

Kinetics

UCF101

Supervised Finetune (Clip-level)

HMDB51

UCF101

Kinetics

Video-level Evaluation

Pretrain And Eval In one step

Video Retrieve and other visualization

(1). Feature Extractor

(2). Reterival Evaluation

Results

Action Recognition

UCF101 Pretrained (I3D)

Kinetics Pretrained

Video Retrieve (UCF101-C3D)

Video Retrieve (HMDB51-C3D)

More Visualization

Acknowledgement

License

Citation

Owner

Jinpeng Wang

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

HNN: Human (Hollywood) Neural Network

OpenMMLab Pose Estimation Toolbox and Benchmark.

Marine debris detection with commercial satellite imagery and deep learning.

A library for uncertainty representation and training in neural networks.

Code and training data for our ECCV 2016 paper on Unsupervised Learning

Improving Object Detection by Label Assignment Distillation

This repository contains the files for running the Patchify GUI.

PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

Tensors and Dynamic neural networks in Python with strong GPU acceleration

AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614

🥇Samsung AI Challenge 2021 1등 솔루션입니다🥇

Franka Emika Panda manipulator kinematics&dynamics simulation

Fast SHAP value computation for interpreting tree-based models

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Genetic Programming in Python, with a scikit-learn inspired API

Image Processing, Image Smoothing, Edge Detection and Transforms

This implementation contains the application of GPlearn's symbolic transformer on a commodity futures sector of the financial market.

Contrastively Disentangled Sequential Variational Audoencoder