[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Overview

DSM

The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Project Website;

Datasets list and some visualizations/provided weights are preparing now.

1. Introduction (scene-dominated to motion-dominated)

Video datasets are usually scene-dominated, We propose to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid.

The generated triplet is as below:

What DSM learned?

With DSM pretrain, the model learn to focus on motion region (Not necessarily actor) powerful without one label available.

2. Installation

Dataset

Please refer dataset.md for details.

Requirements

  • Python3
  • pytorch1.1+
  • PIL
  • Intel (on the fly decode)

3. Structure

  • datasets
    • list
      • hmdb51: the train/val lists of HMDB51
      • ucf101: the train/val lists of UCF101
      • kinetics-400: the train/val lists of kinetics-400
      • diving48: the train/val lists of diving48
  • experiments
    • logs: experiments record in detials
    • gradientes: grad check
    • visualization:
  • src
    • data: load data
    • loss: the loss evaluate in this paper
    • model: network architectures
    • scripts: train/eval scripts
    • augment: detail implementation of Spatio-temporal Augmentation
    • utils
    • feature_extract.py: feature extractor given pretrained model
    • main.py: the main function of finetune
    • trainer.py
    • option.py
    • pt.py: self-supervised pretrain
    • ft.py: supervised finetune

DSM(Triplet)/DSM/Random

Self-supervised Pretrain

Kinetics
bash scripts/kinetics/pt.sh
UCF101
bash scripts/ucf101/pt.sh

Supervised Finetune (Clip-level)

HMDB51
bash scripts/hmdb51/ft.sh
UCF101
bash scripts/ucf101/ft.sh
Kinetics
bash scripts/kinetics/ft.sh

Video-level Evaluation

Following common practice TSN and Non-local. The final video-level result is average by 10 temporal window sampling + corner crop, which lead to better result than clip-level. Refer test.py for details.

Pretrain And Eval In one step

bash scripts/hmdb51/pt_and_ft_hmdb51.sh

Notice: More Training Options and ablation study Can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

UCF101 Pretrained (I3D)

Method UCF101 HMDB51
Random Initialization 47.9 29.6
MoCo Baseline 62.3 36.5
DSM(Triplet) 70.7 48.5
DSM 74.8 52.5

Kinetics Pretrained

Video Retrieve (UCF101-C3D)

Method @1 @5 @10 @20 @50
DSM 16.8 33.4 43.4 54.6 70.7

Video Retrieve (HMDB51-C3D)

Method @1 @5 @10 @20 @50
DSM 8.2 25.9 38.1 52.0 75.0

More Visualization

Acknowledgement

This work is partly based on STN, UEL and MoCo.

License

Citation

If you use our code in your research or wish to refer to the baseline results, pleasuse use the followint BibTex entry.

@inproceedings{wang2020enhancing,
  author    = {Lin, Ji and Zhang, Richard and Ganz, Frieder and Han, Song and Zhu, Jun-Yan},
  title     = {Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion},
  booktitle = {AAAI},
  year      = {2021},
}
Owner
Jinpeng Wang
Focus on Biometrics and Video Understanding, Self/Semi Supervised Learning.
Jinpeng Wang
Code and description for my BSc Project, September 2021

BSc-Project Disclaimer: This repo consists of only the additional python scripts necessary to run the agent. To run the project on your own personal d

Matin Tavakoli 20 Jul 19, 2022
CUAD

Contract Understanding Atticus Dataset This repository contains code for the Contract Understanding Atticus Dataset (CUAD), a dataset for legal contra

The Atticus Project 273 Dec 17, 2022
Course on computational design, non-linear optimization, and dynamics of soft systems at UIUC.

Computational Design and Dynamics of Soft Systems · This is a repository that contains the source code for generating the lecture notes, handouts, exe

Tejaswin Parthasarathy 4 Jul 21, 2022
Lepard: Learning Partial point cloud matching in Rigid and Deformable scenes

Lepard: Learning Partial point cloud matching in Rigid and Deformable scenes [Paper] Method overview 4DMatch Benchmark 4DMatch is a benchmark for matc

103 Jan 06, 2023
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 0 Dec 15, 2022
PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

ContextNet ContextNet has CNN-RNN-transducer architecture and features a fully convolutional encoder that incorporates global context information into

Sangchun Ha 24 Nov 24, 2022
RP-GAN: Stable GAN Training with Random Projections

RP-GAN: Stable GAN Training with Random Projections This repository contains a reference implementation of the algorithm described in the paper: Behna

Ayan Chakrabarti 20 Sep 18, 2021
PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

FullSubNet This Git repository for the official PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech E

郝翔 357 Jan 04, 2023
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation

Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation

Bae, Gwangbin 95 Jan 04, 2023
High level network definitions with pre-trained weights in TensorFlow

TensorNets High level network definitions with pre-trained weights in TensorFlow (tested with 2.1.0 = TF = 1.4.0). Guiding principles Applicability.

Taehoon Lee 1k Dec 13, 2022
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a-Service". Being busy recently, the code in this repo and this tutoria

Tianxiang Sun 149 Jan 04, 2023
Teaching end to end workflow of deep learning

Deep-Education This repository is now available for public use for teaching end to end workflow of deep learning. This implies that learners/researche

Data Lab at College of William and Mary 2 Sep 26, 2022
Codes and Data Processing Files for our paper.

Code Scripts and Processing Files for EEG Sleep Staging Paper 1. Folder Tree ./src_preprocess (data preprocessing files for SHHS and Sleep EDF) sleepE

Chaoqi Yang 18 Dec 12, 2022
GUPNet - Geometry Uncertainty Projection Network for Monocular 3D Object Detection

GUPNet This is the official implementation of "Geometry Uncertainty Projection Network for Monocular 3D Object Detection". citation If you find our wo

Yan Lu 103 Dec 28, 2022
Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System

Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System This repository contains code for the paper Schultheis,

2 Oct 28, 2022
Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

Flood Detection Challenge This repository contains code for our submission to the ETCI 2021 Competition on Flood Detection (Winning Solution #2). Acco

Siddha Ganju 108 Dec 28, 2022
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Pretrained Language Model This repository provides the latest pretrained language models and its related optimization techniques developed by Huawei N

HUAWEI Noah's Ark Lab 2.6k Jan 01, 2023
Koopman operator identification library in Python

pykoop pykoop is a Koopman operator identification library written in Python. It allows the user to specify Koopman lifting functions and regressors i

DECAR Systems Group 34 Jan 04, 2023
This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds This repository is a PyTorch implementation for paper: Uns

Kaizhi Yang 42 Dec 09, 2022
Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22) Paper Link | Project Page Abstract : Manual an

Mohamed Afham 152 Dec 23, 2022