Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting

Last update: Dec 13, 2022

Related tags

Deep Learning DSTT

Overview

Decoupled Spatial-Temporal Transformer for Video Inpainting

By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li.

This repo is the official Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting.

Introduction

Usage

Prerequisites

Python >= 3.6
Pytorch >= 1.0 and corresponding torchvision (https://pytorch.org/)

Install

Clone this repo:

git clone https://github.com/ruiliu-ai/DSTT.git

Install other packages:

cd DSTT
pip install -r requirements.txt

Training

Dataset preparation

Download datasets (YouTube-VOS and DAVIS) into the data folder.

mkdir data

Training script

python train.py -c configs/youtube-vos.json

Test

Download pre-trained model into checkpoints folder.

mkdir checkpoints

Test script

python test.py -c checkpoints/dstt.pth -v data/DAVIS/JPEGImages/blackswan -m data/DAVIS/Annotations/blackswan

Citing DSTT

If you find DSTT useful in your research, please consider citing:

@article{Liu_2021_DSTT,
  title={Decoupled Spatial-Temporal Transformer for Video Inpainting},
  author={Liu, Rui and Deng, Hanming and Huang, Yangyi and Shi, Xiaoyu and Lu, Lewei and Sun, Wenxiu and Wang, Xiaogang and Li Hongsheng},
  journal={arXiv preprint arXiv:2104.06637},
  year={2021}
}

Acknowledement

This code relies heavily on the video inpainting framework from spatial-temporal transformer net.

Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting

Related tags

Overview

Decoupled Spatial-Temporal Transformer for Video Inpainting

Introduction

Usage

Prerequisites

Install

Training

Dataset preparation

Training script

Test

Test script

Citing DSTT

Acknowledement

Owner

Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

In this project, we create and implement a deep learning library from scratch.

A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.

Streaming over lightweight data transformations

MTA:SA Server Configer.

Pytorch implementation of Compressive Transformers, from Deepmind

Blender scripts for computing geodesic distance

A pre-trained model with multi-exit transformer architecture.

A Novel Plug-in Module for Fine-grained Visual Classification

ECLARE: Extreme Classification with Label Graph Correlations

MegEngine implementation of YOLOX

Official implementation of the MM'21 paper Constrained Graphic Layout Generation via Latent Optimization

This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.

3D detection and tracking viewer (visualization) for kitti & waymo dataset

IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID,

LeafSnap replicated using deep neural networks to test accuracy compared to traditional computer vision methods.

code for Image Manipulation Detection by Multi-View Multi-Scale Supervision

This is the reference implementation for "Coresets via Bilevel Optimization for Continual Learning and Streaming"

Knowledge Management for Humans using Machine Learning & Tags