Dense Unsupervised Learning for Video Segmentation (NeurIPS*2021)

Last update: Dec 26, 2022

Overview

Dense Unsupervised Learning for Video Segmentation

This repository contains the official implementation of our paper:

Dense Unsupervised Learning for Video Segmentation
Nikita Araslanov, Simone Schaub-Mayer and Stefan Roth
To appear at NeurIPS*2021. [paper] [supp] [talk] [example results] [arXiv]


We efficiently learn spatio-temporal correspondences without any supervision, and achieve state-of-the-art accuracy of video object segmentation.

Contact: Nikita Araslanov fname.lname (at) visinf.tu-darmstadt.de

Installation

Requirements. To reproduce our results, we recommend Python >=3.6, PyTorch >=1.4, CUDA >=10.0. At least one Titan X GPUs (12GB) or equivalent is required. The code was primarily developed under PyTorch 1.8 on a single A100 GPU.

The following steps will set up a local copy of the repository.

Create conda environment:

conda create --name dense-ulearn-vos
source activate dense-ulearn-vos

Install PyTorch >=1.4 (see PyTorch instructions). For example on Linux, run:

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Install the dependencies:

pip install -r requirements.txt

Download the data:

Dataset	Website	Target directory with video sequences
YouTube-VOS	Link	`data/ytvos/train/JPEGImages/`
OxUvA	Link	`data/OxUvA/images/dev/`
TrackingNet	Link	`data/tracking/train/jpegs/`
Kinetics-400	Link	`data/kinetics400/video_jpeg/train/`

The last column in this table specifies a path to subdirectories (relative to the project root) containing images of video frames. You can obviously use a different path structure. In this case, you will need to adjust the paths in data/filelists/ for every dataset accordingly.

Download filelists:

cd data/filelists
bash download.sh

This will download lists of training and validation paths for all datasets.

Training

We following bash script will train a ResNet-18 model from scratch on one of the four supported datasets (see above):

bash ./launch/train.sh [ytvos|oxuva|track|kinetics]

We also provide our final models for download.

Dataset	Mean J&F (DAVIS-2017)	Link	MD5
OxUvA	65.3	oxuva_e430_res4.pth (132M)	`af541[...]d09b3`
YouTube-VOS	69.3	ytvos_e060_res4.pth (132M)	`c3ae3[...]55faf`
TrackingNet	69.4	trackingnet_e088_res4.pth (88M)	`3e7e9[...]95fa9`
Kinetics-400	68.7	kinetics_e026_res4.pth (88M)	`086db[...]a7d98`

Inference and evaluation

Inference

To run the inference use launch/infer_vos.sh:

bash ./launch/infer_vos.sh [davis|ytvos]

The first argument selects the validation dataset to use (davis for DAVIS-2017; ytvos for YouTube-VOS). The bash variables declared in the script further help to set up the paths for reading the data and the pre-trained models as well as the output directory:

EXP, RUN_ID and SNAPSHOT determine the pre-trained model to load.
VER specifies a suffix for the output directory (in case you would like to experiment with different configurations for label propagation). Please, refer to launch/infer_vos.sh for their usage.

The inference script will create two directories with the result: [res3|res4|key]_vos and [res3|res4|key]_vis, where the prefix corresponds to the codename of the output CNN layer used in the evaluation (selected in infer_vos.sh using KEY variable). The vos-directory contains the segmentation result ready for evaluation; the vis-directory produces the results for visualisation purposes. You can optionally disable generating the visualisation by setting VERBOSE=False in infer_vos.py.

Evaluation: DAVIS-2017

Please use the official evaluation package. Install the repository, then simply run:

python evaluation_method.py --task semi-supervised --davis_path data/davis2017 --results_path <path-to-vos-directory>

Evaluation: YouTube-VOS 2018

Please use the official CodaLab evaluation server. To create the submission, rename the vos-directory to Annotations and compress it to Annotations.zip for uploading.

Acknowledgements

We thank PyTorch contributors and Allan Jabri for releasing their implementation of the label propagation.

Citation

We hope you find our work useful. If you would like to acknowledge it in your project, please use the following citation:

@inproceedings{Araslanov:2021:DUL,
  author    = {Araslanov, Nikita and Simone Schaub-Mayer and Roth, Stefan},
  title     = {Dense Unsupervised Learning for Video Segmentation},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  volume    = {34},
  year = {2021}
}

Dense Unsupervised Learning for Video Segmentation (NeurIPS*2021)

Related tags

Overview

Dense Unsupervised Learning for Video Segmentation

Installation

Training

Inference and evaluation

Inference

Evaluation: DAVIS-2017

Evaluation: YouTube-VOS 2018

Acknowledgements

Citation

Owner

Visual Inference Lab @TU Darmstadt

This is the official implementation for the paper "(Almost) Free Incentivized Exploration from Decentralized Learning Agents" in NeurIPS 2021.

Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" (NeurIPS'20)

PyTorch common framework to accelerate network implementation, training and validation

Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis

Single cell current best practices tutorial case study for the paper:Luecken and Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial"

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Public repository created to store my custom-made tools for Just Dance (UbiArt Engine)

Imaginaire - NVIDIA's Deep Imagination Team's PyTorch Library

data/code repository of "C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer"

Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks

FasterAI: A library to make smaller and faster models with FastAI.

A script written in Python that returns a consensus string and profile matrix of a given DNA string(s) in FASTA format.

The official implementation of Equalization Loss v1 & v2 (CVPR 2020, 2021) based on MMDetection.

Selective Wavelet Attention Learning for Single Image Deraining

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Designing a Practical Degradation Model for Deep Blind Image Super-Resolution (ICCV, 2021) (PyTorch) - We released the training code!

A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

Semantic Bottleneck Scene Generation

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM