Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Last update: Dec 07, 2022

Related tags

Overview

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Introduction

Despite the significant progress made by deep learning in natural image matting, there has been so far no representative work on deep learning for video matting due to the inherent technical challenges in reasoning temporal domain and lack of large-scale video matting datasets. In this paper, we propose a deep learning-based video matting framework which employs a novel and effective spatio-temporal feature aggregation module (ST-FAM). As optical flow estimation can be very unreliable within matting regions, ST-FAM is designed to effectively align and aggregate information across different spatial scales and temporal frames within the network decoder. To eliminate frame-by-frame trimap annotations, a lightweight interactive trimap propagation network is also introduced. The other contribution consists of a large-scale video matting dataset with groundtruth alpha mattes for quantitative evaluation and real-world high-resolution videos with trimaps for qualitative evaluation. Quantitative and qualitative experimental results show that our framework significantly outperforms conventional video matting and deep image matting methods applied to video in presence of multi-frame temporal information.

Framework

Dataset

We composite foreground images and videos onto high-resolution background videos to generate large-scale video matting training/testing dataset. Follow the steps to prepare the datasets. The structure is as the following.

DVM
  ├── fg
    ├── image
      ├── train
        ├── alpha
          ├── xxx.png
          ├── yyy.png
          ├── ...
        ├── fg
          ├── xxx.png
          ├── yyy.png
          ├── ...
      ├── test
        ├── alpha
          ├── xxx.png
          ├── yyy.png
          ├── ...
        ├── fg
          ├── xxx.png
          ├── yyy.png
          ├── ...
        ├── trimap
          ├── xxx.png
          ├── yyy.png
          ├── ...
    ├── video
      ├── train
        ├── 0000
          ├── a.mp4
          ├── f.mp4
        ├── ...
      ├── test
        ├── 0000
          ├── a.mp4
          ├── f.mp4
        ├── ...
  ├── bg
    ├── train
      ├── 0000.mp4
      ├── 0001.mp4
      ├── ...
    ├── test
      ├── 0000.mp4
      ├── 0001.mp4
      ├── ...

Please contact Brian Price ([email protected]) for the Adobe Image Matting dataset.
Put training fg/alpha images and testing fg/alpha/trimap images from Adobe dataset in the corresponding directories.
Download training/testing videos and place them in the corresponding directories.

Link: https://pan.baidu.com/s/1yBJr0SqsEjDToVAUb8dSCw Password: l9ck
Use data/process.py to generate training/testing datasets. About 1T storage is needed.

Reference

If you find our work useful in your research, please consider citing:

@inproceedings{sun2021dvm,
  author    = {Yanan Sun and Guanzhi Wang and Qiao Gu and Chi-Keung Tang and Yu-Wing Tai}
  title     = {Deep Video Matting via Spatio-Temporal Alignment and Aggregation},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
}

Contact

If you have any questions or suggestions about this repo, please feel free to contact me ([email protected]).

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Related tags

Overview

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Introduction

Framework

Dataset

Reference

Contact

Owner

Udacity's CS101: Intro to Computer Science - Building a Search Engine

Cascading Feature Extraction for Fast Point Cloud Registration (BMVC 2021)

[NeurIPS 2021] PyTorch Code for Accelerating Robotic Reinforcement Learning with Parameterized Action Primitives

SLAMP: Stochastic Latent Appearance and Motion Prediction

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

A toolset of Python programs for signal modeling and indentification via sparse semilinear autoregressors.

Spline is a tool that is capable of running locally as well as part of well known pipelines like Jenkins (Jenkinsfile), Travis CI (.travis.yml) or similar ones.

Reimplementation of Dynamic Multi-scale filters for Semantic Segmentation.

In this project, we create and implement a deep learning library from scratch.

2D&3D human pose estimation

Compact Bilinear Pooling for PyTorch

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

A certifiable defense against adversarial examples by training neural networks to be provably robust

Official implementation of ACMMM'20 paper 'Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework'

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

GLIP: Grounded Language-Image Pre-training

Replication Code for "Self-Supervised Bug Detection and Repair" NeurIPS 2021

Code for "My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack" paper

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Related tags

Overview

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Introduction

Framework

Dataset

Reference

Contact

Owner

Udacity's CS101: Intro to Computer Science - Building a Search Engine

Cascading Feature Extraction for Fast Point Cloud Registration (BMVC 2021)

[NeurIPS 2021] PyTorch Code for Accelerating Robotic Reinforcement Learning with Parameterized Action Primitives

SLAMP: Stochastic Latent Appearance and Motion Prediction

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

A toolset of Python programs for signal modeling and indentification via sparse semilinear autoregressors.

Spline is a tool that is capable of running locally as well as part of well known pipelines like Jenkins (Jenkinsfile), Travis CI (.travis.yml) or similar ones.

Reimplementation of Dynamic Multi-scale filters for Semantic Segmentation.

In this project, we create and implement a deep learning library from scratch.

2D&3D human pose estimation

Compact Bilinear Pooling for PyTorch

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

A certifiable defense against adversarial examples by training neural networks to be provably robust

Official implementation of ACMMM'20 paper 'Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework'

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

GLIP: Grounded Language-Image Pre-training

Replication Code for "Self-Supervised Bug Detection and Repair" NeurIPS 2021

Code for "My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack" paper

AI创造营 ：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人