Github project for Attention-guided Temporal Coherent Video Object Matting.

Related tags

Deep LearningTCVOM
Overview

Attention-guided Temporal Coherent Video Object Matting

This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matting (arXiv:2105.11427). We provide our code, the supplementary material, trained model and VideoMatting108 dataset here. For the trimap generation module, please see TCVOM-TGM.

The code, the trained model and the dataset are for academic and non-commercial use only.

The supplementary material can be found here.

Table of Contents

VideoMatting108 Dataset

VideoMatting108 is a large video matting dataset that contains 108 video clips with their corresponding groundtruth alpha matte, all in 1080p resolution, 80 clips for training and 28 clips for validation.

You can download the dataset here. The total size of the dataset is 192GB and we've split the archive into 1GB chunks.

The contents of the dataset are the following:

  • FG: contains the foreground RGBA image, where the alpha channel is the groundtruth matte and RGB channel is the groundtruth foreground.
  • BG: contains background RGB image used for composition.
  • flow_png_val: contains quantized optical flow of validation video clips for calculating MESSDdt metric. You can choose not to download this folder if you don't need to calculate this metric. You can refer to the _flow_read() function in calc_metric.py for usage.
  • *_videos*.txt: train / val split.
  • frame_corr.json: FG / BG frame pair used for composition.

After decompressing, the dataset folder should have the structure of the following (please rename flow_png_val to flow_png):

|---dataset
  |-FG_done
  |-BG_done
  |-flow_png
  |-frame_corr.json
  |-train_videos.txt
  |-train_videos_subset.txt
  |-val_videos.txt
  |-val_videos_subset.txt

Models

Currently our method supports four different image matting methods as base.

  • gca (GCA Matting by Li et al., code is from here)
  • dim (DeepImageMatting by Xu et al., we use the reimplementation code from here)
  • index (IndexNet Matting by Lu et al., code is from here)
  • fba (FBA Matting by Forte et al., code is from here)
    • There are some differences in our training and the original FBA paper. We believe that there are still space for further performance gain through hyperparameter fine-tuning.
      • We did not use the foreground extension technique during training. Also we use four GPUs instead of one.
      • We used the conventional adam optimizer instead of radam.
      • We used mean instead of sum during loss computation to keep the loss balanced (especially for L_af).

The trained model can be downloaded here. We provide four different weights for every base method.

  • *_SINGLE_Lim.pth: The trained weight of the base image matting method on the VideoMatting108 dataset without TAM. Only L_im is used during the pretrain. This is the baseline model.
  • *_TAM_Lim_Ltc_Laf.pth: The trained weight of base image matting method with TAM on VideoMatting108 dataset. L_im, L_tc and L_af is used during the training. This is our full model.
  • *_TAM_pretrain.pth: The pretrained weight of base image matting method with TAM on the DIM dataset. Only L_im is used during the training.
  • *_fe.pth: The converted weight from the original model checkpoint, only used for pretraining TAM.

Results

This is the quantitative result on VideoMatting108 validation dataset with medium width trimap. The metric is averaged on all 28 validation video clips.

We use CUDA 10.2 during the inference. Using CUDA 11.1 might result in slightly lower metric. All metrics are calculated with calc_metric.py.

Method Loss SSDA dtSSD MESSDdt MSE*(10^3) mSAD
GCA+F (Baseline) L_im 55.82 31.64 2.15 8.20 40.85
GCA+TAM L_im+L_tc+L_af 50.41 27.28 1.48 7.07 37.65
DIM+F (Baseline) L_im 61.85 34.55 2.82 9.99 44.38
DIM+TAM L_im+L_tc+L_af 58.94 29.89 2.06 9.02 43.28
Index+F (Baseline) L_im 58.53 33.03 2.33 9.37 43.53
Index+TAM L_im+L_tc+L_af 57.91 29.36 1.81 8.78 43.17
FBA+F (Baseline) L_im 57.47 29.60 2.19 9.28 40.57
FBA+TAM L_im+L_tc+L_af 51.57 25.50 1.59 7.61 37.24

Usage

Requirements

Python=3.8
Pytorch=1.6.0
numpy
opencv-python
imgaug
tqdm
yacs

Inference

pred_single.py and pred_vmn.py automatically use all CUDA devices available. pred_test.py uses cuda:0 device as default.

  • Inference on VideoMatting108 validation set using our full model

    • python pred_vmd.py --model {gca,dim,index,fba} --data /path/to/VideoMatting108dataset --load /path/to/weight.pth --trimap {wide,narrow,medium} --save /path/to/outdir
  • Inference on VideoMatting108 validation set using the baseline model

    • python pred_single.py --dataset vmd --model {gca,dim,index,fba} --data /path/to/VideoMatting108dataset --load /path/to/weight.pth --trimap {wide,narrow,medium} --save /path/to/outdir
  • Calculating metrics

    • python calc_metric.py --pred /path/to/prediction/result --data /path/to/VideoMatting108dataset
    • The result will be saved in metric.json inside /path/to/prediction/result. Use tail to see the final averaged result.

  • Inference on test video clips

    • First, prepare the data. Make sure the workspace folder has the structure of the following:

      |---workspace
        |---video1
          |---00000_rgb.png
          |---00000_trimap.png
          |---00001_rgb.png
          |---00001_trimap.png
          |---....
        |---video2
        |---video3
        |---...
      
    • python pred_test.py --gpu CUDA_DEVICES_NUMBER_SPLIT_BY_COMMA --model {gca,vmn_gca,dim,vmn_dim,index,vmn_index,fba,vmn_fba} --data /path/to/workspace --load /path/to/weight.pth --save /path/to/outdir [video1] [video2] ...
      • The model parameter: vmn_BASEMETHOD corresponds to our full model, BASEMETHOD corresponds to the baseline model.
      • Without specifying the name of the video clip folders in the command line, the script will process all video clips under /path/to/workspace.

Training

PY_CMD="python -m torch.distributed.launch --nproc_per_node=NUMBER_OF_CUDA_DEVICES"
  • Pretrain TAM on DIM dataset. Please see cfgs/pretrain_vmn_BASEMETHOD.yaml for configuration and refer to dataset/DIM.py for dataset preparation.

    $PY_CMD pretrain_ddp.py --cfg cfgs/pretrain_vmn_index.yaml
  • Training our full method on VideoMatting108 dataset. This will load the pretrained TAM weight as initialization. Please see cfgs/vmd_vmn_BASEMETHOD_pretrained_30ep.yaml for configuration.

    $PY_CMD train_ddp.py --cfg /path/to/config.yaml
  • Training the baseline method on VideoMatting108 dataset without TAM. Please see cfgs/vmd_vmn_BASEMETHOD_pretrained_30ep_single.yaml for configuration.

    $PY_CMD train_single_ddp.py --cfg /path/to/config.yaml

Contact

If you have any questions, please feel free to contact [email protected].

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Yuqing Wang 687 Jan 07, 2023
Train a deep learning net with OpenStreetMap features and satellite imagery.

DeepOSM Classify roads and features in satellite imagery, by training neural networks with OpenStreetMap (OSM) data. DeepOSM can: Download a chunk of

TrailBehind, Inc. 1.3k Nov 24, 2022
A code generator from ONNX to PyTorch code

onnx-pytorch Generating pytorch code from ONNX. Currently support onnx==1.9.0 and torch==1.8.1. Installation From PyPI pip install onnx-pytorch From

Wenhao Hu 94 Jan 06, 2023
This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Mesa: A Memory-saving Training Framework for Transformers This is the official PyTorch implementation for Mesa: A Memory-saving Training Framework for

Zhuang AI Group 105 Dec 06, 2022
Civsim is a basic civilisation simulation and modelling system built in Python 3.8.

Civsim Introduction Civsim is a basic civilisation simulation and modelling system built in Python 3.8. It requires the following packages: perlin_noi

17 Aug 08, 2022
Repo for my Tensorflow/Keras CV experiments. Mostly revolving around the Danbooru20xx dataset

SW-CV-ModelZoo Repo for my Tensorflow/Keras CV experiments. Mostly revolving around the Danbooru20xx dataset Framework: TF/Keras 2.7 Training SQLite D

20 Dec 27, 2022
2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

TableMASTER-mmocr Contents About The Project Method Description Dependency Getting Started Prerequisites Installation Usage Data preprocess Train Infe

Jianquan Ye 298 Dec 21, 2022
PyTorch implementation of the ideas presented in the paper Interaction Grounded Learning (IGL)

Interaction Grounded Learning This repository contains a simple PyTorch implementation of the ideas presented in the paper Interaction Grounded Learni

Arthur Juliani 4 Aug 31, 2022
LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations

LIMEcraft LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations The LIMEcraft algorithm is an explanatory method based on

MI^2 DataLab 4 Aug 01, 2022
Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Unified-EPT Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation. Installation Linux, CUDA=10.0,

29 Aug 23, 2022
RLDS stands for Reinforcement Learning Datasets

RLDS RLDS stands for Reinforcement Learning Datasets and it is an ecosystem of tools to store, retrieve and manipulate episodic data in the context of

Google Research 135 Jan 01, 2023
Toolkit for collecting and applying prompts

PromptSource Promptsource is a toolkit for collecting and applying prompts to NLP datasets. Promptsource uses a simple templating language to programa

BigScience Workshop 998 Jan 03, 2023
Supervised forecasting of sequential data in Python.

Supervised forecasting of sequential data in Python. Intro Supervised forecasting is the machine learning task of making predictions for sequential da

The Alan Turing Institute 54 Nov 15, 2022
neural image generation

pixray Pixray is an image generation system. It combines previous ideas including: Perception Engines which uses image augmentation and iteratively op

dribnet 398 Dec 17, 2022
Estimation of human density in a closed space using deep learning.

Siemens HOLLZOF challenge - Human Density Estimation Add project description here. Installing Dependencies: Install Python3 either system-wide, user-w

3 Aug 08, 2021
HAR-stacked-residual-bidir-LSTMs - Deep stacked residual bidirectional LSTMs for HAR

HAR-stacked-residual-bidir-LSTM The project is based on this repository which is presented as a tutorial. It consists of Human Activity Recognition (H

Guillaume Chevalier 287 Dec 27, 2022
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

⚠️ ‎‎‎ A more recent and actively-maintained version of this code is available in ivadomed Stacked Hourglass Network with a Multi-level Attention Mech

Reza Azad 14 Oct 24, 2022
Pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments

Cascaded-FCN This repository contains the pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments the liver and its lesions out of

300 Nov 22, 2022
Codes for "Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier"

Deep-RTC [project page] This repository contains the source code accompanying our ECCV 2020 paper. Solving Long-tailed Recognition with Deep Realistic

Gina Wu 16 May 26, 2022
PyTorch implementation of neural style transfer algorithm

neural-style-pt This is a PyTorch implementation of the paper A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias

770 Jan 02, 2023