iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Related tags

Deep Learningipoke
Overview

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Show me that GUI

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer

TL;DR We present iPOKE, a model for locally controlled, stochastic video synthesis based on poking a single pixel in a static scene, that enables users to animate still images only with simple mouse drags.

Arxiv | Project page | BibTeX

Table of contents

  1. Requirements
  2. Pretrained models
  3. Graphical User Interface
  4. Generating samples
  5. Data preparation
  6. Evaluation
  7. Train your own models
  8. Shout-outs
  9. BibTeX

Requirements

A suitable conda environment named ipoke can be created with

conda env create -f ipoke.yml 
conda activate ipoke

Pretrained models

To you can find all pretrained models here. Download and extract the zip-file in a <LOGDIR> and create a symbolic link to the created repository which is name ipoke via

ln -s <LOGDIR>/ipoke logs

Here's a list of all available pretrained models, which are contained in the extracted directories.

Dataset Spatial Video resolution Model Name FVD
Poking Plants 128 x 128 plants_128 63.06
Poking Plants 64 x 64 plants_64 56.59
iPER 128 x 128 iper_128 74.53
iPER 64 x 64 iper_64 81.49
Human3.6m 128 x 128 h36m_128 119.77
Human3.6m 64 x 64 h36m_64 111.55
TaiChi-HD 128 x 128 taichi_128 100.69
TaiChi-HD 64 x 64 taichi_64 96.09

Make sure to first prepare the data before using our pretrained models.

Graphical User Interface

Show me that GUI

To get in touch with our models, use our GUI via the command

python -m testing.gui --model_name <MODEL_NAME> --gpu <GPU_ID>

, where the <MODEL_NAME> parameter shoud be one of the model names in the above table which shows our provided pretrained models.

Generating samples

Controlled stochastic video synthesis

Show me the samples!

Samples can also be automatically generated by using simulated pokes based on optical flow via

python -W ignore  main.py --config config/second_stage.yaml --gpus <GPU_IDs> --model_name <MODEL_NAME> --test samples

The resulting videos will be saved to <LOGDIR>/ipoke/second_stage/generated/<MODEL_NAME>/samples_best_fvd.

Kinematics transfer

Show me some transfer

Moreover, our iPOKE model provides means to transfer kinematics between videos of persons with similar start pose as shown in the above examples. Similar results can be generated with

python -W ignore  main.py --config config/second_stage.yaml --gpus <GPU_IDs> --model_name <MODEL_NAME> --test transfer

The resulting videos will be saved to <LOGDIR>/ipoke/second_stage/generated/<MODEL_NAME>/transfer. NOTE This is currently only possible for the iPER dataset.

Control sensitivity

Show me some transfer

To observe the results from different pokes at the same pixel, you can run

python -W ignore  main.py --config config/second_stage.yaml --gpus <GPU_IDs> --model_name <MODEL_NAME> --test control_sensititvity

The resulting videos will be saved to <LOGDIR>/ipoke/second_stage/generated/<MODEL_NAME>/poke_dir_samples_best_fvd. NOTE This is currently only possible for the iPER dataset.

Data Preparation

Get FlowNet2 and PoseHRNet for data processing

As preparing the data to evaluate our pretrained models or train new ones requires to estimate optical flow maps and human poses (currently only supported for iPER), we added the respective models Flownet2 and PoseHRNet as a git submodules. To clone, simply run

git submodule init
git submodule sync
git submodule update

Since Flownet2 requires cuda-10.0 and is therefore not compatible with our main conda environment, we provide a separate conda enviroment for optical flow estimation which can bet created via

conda env create -f data_proc.yml

You can activate the environment and specify the right cuda version by using

source activate_data_proc

from the root of this repository. IMPORTANT: You have to ensure that lines 3 and 4 in the activate_data_proc-script add your respective cuda-10.0 installation direcories to the PATH and LD_LIBRARY_PATH environment variables. This environment, however, is only required for generating the datasets and will not be required afterwards. Finally, you have to build the custom layers of FlowNet2 and PoseHRNet with

cd models/flownet2
bash install.sh -ccbin <PATH TO_GCC7>
cd ../pose_estimator/lib
make

, where <PATH TO_GCC7> is the path to your gcc-7-binary, which is usually /usr/bin/gcc-7 on a linux server. Make sure that your data_proc environment is activated and that the env-variables contain the cuda-10.0 installation when running the script (which is both done by running source activate_data_proc).

Poking Plants

Download Poking Plants dataset from here and extract it to a <TARGETDIR>, which then contains the raw video files. To extract the multi-zip file, use

zip -s 0 poking_plants.zip --out poking_plants_unsplit.zip
unzip poking_plants_unsplit.zip

To extract the individual frames and estimate optical flow set the value of the field raw_dir in config/data_preparation/plants.yaml to be <TARGETDIR>, define the target location for the extracted frames (, where all frames of each video will be within a unique directory) via the field processed_dir and run

source activate_data_proc
python -m utils.prepare_dataset --config config/data_preparation/plants.yaml

By defining the number of parallel runs of flownet2, which will be distributed among the gpus with the ids specified in target_gpus, with the num_workers-argument, you can significantly speed up the optical flow estimation.

iPER

Download the zipped videos in iPER_1024_video_release.zip from this website website (note that you have to create a microsoft account to get access) and extract the archive to a <TARGETDIR> similar to the above example. There, you'll also find the train.txt and val.txt. Download these files and save them in the <TARGETDIR> Again, set the undefined value of the field raw_dir in config/data_preparation/iper.yaml to be <TARGETDIR>, define the target location for the extracted frames and the optical flow via processed_dir and run

python -m utils.prepare_dataset --config config/data_preparation/iper.yaml

with the flownet2 environment activated.

Human3.6m

Firstly, you will need to create an account at the homepage of the Human3.6m dataset to gain access to the dataset. After your account is created and approved (takes a couple of hours), log in and inspect your cookies to find your PHPSESSID. Fill in that PHPSESSID in data/config.ini and also specify the TARGETDIR there, where the extracted videos will be later stored. After setting the field processed_dir in config/data_preparation/human36m.yaml, you can download and extract the videos via

source activate_data_proc
python -m data.human36m_preprocess

with the flownet2 environment activated. Frame extraction and optical flow estimation are then done as usual with

source activate_data_proc
python -m data.prepare_dataset --config config/data_preparation/human36m.yaml

TaiChi-HD

To download and extract the videos, follow the steps listed at the download page for this dataset and set the out_folder argument of the script load_videos.py to be our <TARGETDIR> from the above examples. Again set the fields raw_dir and processed_dir in config/data_preparation/taichi.yaml similar to the above examples and run

source activate_data_proc
python -m data.prepare_dataset --config config/data_preparation/taichi.yaml

with the flownet2 environment activated to extract the individual frames and estimate the optical flow maps.

Evaluation

To reproduce the quantitative results presented in the paper for all our provided pretrained models, run

python -m testing.eval_models --gpu <GPU_ID> -e <TEST_MODE>

where TEST_MODE should be in [fvd, accuracy, diversity, kps_acc]. The models which shall be evaluated are specified in the file config/model_names.txt. Here's an explanation of the different values of the <TEST_MODE> parameter:

<TEST_MODE> Experiment Comment
fvd Compute FVD scores if you encounter tensorflow errors due to missing libraries add LD_LIBRARY_PATH=/usr/local/<LOCAL_CUDA_VERSION>/targets/x86_64-linux/lib/ before the above command. (Tested under Ubuntu 20.04 LTS)
accuracy Calculate accuracy scores [LPIPS, SSIM, PSNR] as explained in the paper, results are printed to console and are also saved to logs/second_stage/generated/<MODEL_NAME>/metrics/ for the respective model
diversity Calculate diversity scores based on [LPIPS, MSE] as explained in the paper , results are printed to console and are also saved to logs/second_stage/generated/<MODEL_NAME>/metrics/ for the respective model
kps_acc Targeted keypoint accuracy only for the poked body parts For a detailed explanation, see Fig. 8 and the respective section in the paper; Only supported for the models trained on the iPER dataset.

If you only want to calculate the metrics only for one of our models or if you want to test your own one, run

python -W ignore main.py --config config/second_stage.yaml --model_name <MODEL_NAME> --gpus <GPU_IDs> --test <TEST_MODE>

Again, make sure to add LD_LIBRARY_PATH=/usr/local/<LOCAL_CUDA_VERSION>/targets/x86_64-linux/lib/ before the command if there are tensorflow errors caused by missing libraries when calculating FVD-scores.

Train your own models

As stated in our paper, our overall training procedure is divided in two main stages. To enable tractable training for our input-output-dimensionality preserving invertible model we first pretrain a video autoencoding framework to obtain latent video codes with much smaller dimensionality than the original videos. After that we train our conditional invertible generative model on these compressed video representations.

For logging our runs we used and recommend wandb. Please create a free account and add your username to the config. During training of both our video autoencoding (first stage) and invertible models (second stage) we save those checkpoints with the smallest FVD-score during evaluation. As the original FVD implementation only available in tensorflow, we created a custom pytorch FVD-model which we use during training (for evaluation, we use the original implementation). The copmuted scores do not coincide with the original ones but the are strongly correlated. Therefore, this metric serves well when intending to optimize the model wrt. FVD.

Video autoencoding model

To train our video autoencoding model run the following command

python -W ignore main.py --config config/first_stage.yaml --gpus <GPU_ID> --model_name <MODEL_NAME>

The used train data, model architecture and video resolution can be specified in config/first_stage.yaml. The the comments for an explanantion of the parameters.

If you have trained such a model and want to use it for subsequent training of our invertible second stage model you can add it to the first_stage_models-dict in the file models/pretrained_models.py by simply specifying the <MODEL_NAME> and the path to the checkpoint-file want to use.

Invertible generative model

Our conditional invertible model can be trained via the command

python -W ignore main.py --config config/second_stage.yaml --gpus <GPU_ID> --model_name <MODEL_NAME>

Again, the respective parameters to define the data and model hyperparameters can be specified in the config file config/second_stage.yaml. We also provide config files to train with the exact parameters which were used for our pretrained models. These files can be found in config/pretrained_models/.

As our invertible models rely on pretrained networks (video autoencoding models as well as encoders for the source image x_0 and the poke c) you have to specify these models in the config. We provide all such pretrained models on all considered datasets for video resolutions 64X64 and 128X128. These are automatically selected based on the keys specified in the config files when starting the models. All available pretrained models and their keys can be found and expanded in models/pretrained_models.py.

Poke encoder

To train a new poke encoder, run the following command

python -W ignore main.py --config config/poke_encoder.yaml --gpus <GPU> --model_name <MODEL_NAME>

As for our video autoencoding framework, you cann add your final trained model to the respective poke_embedder-dict in models/pretrained_models.py.

Source image encoder

To train a new poke encoder, run the following command

python -W ignore main.py --config config/img_encoder.yaml --gpus <GPU> --m[PyTorch FID](https://github.com/mseitzer/pytorch-fid)odel_name <MODEL_NAME>

As for our video autoencoding framework, you cann add your final trained model to the respective conditioner-dict in models/pretrained_models.py.

cVAE baseline

Finally we also provide code to train the cVAE baseline which we used in the ablation study in our paper. To train such a model, run

python -W ignore main.py --config config/baseline_vae.yaml --gpus <GPU> --model_name <MODEL_NAME>

Shout-outs

Thanks to everyone who makes their code and models available. In particular,

  • The Wolf library, from where we borrowed the basic operations for our masked convolutional normalizing flow implementation
  • Our 3D encoder and discriminator are based on 3D-Resnet and spatial discriminator is adapted from PatchGAN
  • The deep features based metrics which were used: LPIPS and FVD

BibTeX

@misc{blattmann2021ipoke,
      title={iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis}, 
      author={Andreas Blattmann and Timo Milbich and Michael Dorkenwald and Björn Ommer},
      year={2021},
      eprint={2107.02790},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
CompVis Heidelberg
Computer Vision research group at the Ruprecht-Karls-University Heidelberg
CompVis Heidelberg
HarDNeXt: Official HarDNeXt repository

HarDNeXt-Pytorch HarDNeXt: A Stage Receptive Field and Connectivity Aware Convolution Neural Network HarDNeXt-MSEG for Medical Image Segmentation in 0

5 May 26, 2022
Computational inteligence project on faces in the wild dataset

Table of Contents The general idea How these scripts work? Loading data Needed modules and global variables Parsing the arrays in dataset Extracting a

tooraj taraz 4 Oct 21, 2022
This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

Demo BERT ONNX pipeline written in rust This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust. R

Xavier Tao 14 Dec 17, 2022
Automatic voice-synthetised summaries of latest research papers on arXiv

PaperWhisperer PaperWhisperer is a Python application that keeps you up-to-date with research papers. How? It retrieves the latest articles from arXiv

Valerio Velardo 124 Dec 20, 2022
Simple streamlit app to demonstrate HERE Tour Planning

Table of Contents About the Project Built With Getting Started Prerequisites Installation Usage Roadmap Contributing License Acknowledgements About Th

Amol 8 Sep 05, 2022
Code for "Hierarchical Skills for Efficient Exploration" HSD-3 Algorithm and Baselines

Hierarchical Skills for Efficient Exploration This is the source code release for the paper Hierarchical Skills for Efficient Exploration. It contains

Facebook Research 38 Dec 06, 2022
Optimized Gillespie algorithm for simulating Stochastic sPAtial models of Cancer Evolution (OG-SPACE)

OG-SPACE Introduction Optimized Gillespie algorithm for simulating Stochastic sPAtial models of Cancer Evolution (OG-SPACE) is a computational framewo

Data and Computational Biology Group UNIMIB (was BI*oinformatics MI*lan B*icocca) 0 Nov 17, 2021
Codes for NeurIPS 2021 paper "On the Equivalence between Neural Network and Support Vector Machine".

On the Equivalence between Neural Network and Support Vector Machine Codes for NeurIPS 2021 paper "On the Equivalence between Neural Network and Suppo

Leslie 8 Oct 25, 2022
A hue shift helper for OBS

obs-hue-shift A hue shift helper for OBS This is a repo based on the really nice script Hegemege made. The original script can be found https://gist.g

Alexis Tyler 1 Jan 10, 2022
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 49 Nov 28, 2022
PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

Irhum Shafkat 342 Dec 16, 2022
The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

Directed Graph Contrastive Learning Paper | Poster | Supplementary The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL). In this

Tong Zekun 28 Jan 08, 2023
DropNAS: Grouped Operation Dropout for Differentiable Architecture Search

DropNAS: Grouped Operation Dropout for Differentiable Architecture Search DropNAS, a grouped operation dropout method for one-level DARTS, with better

weijunhong 4 Aug 15, 2022
Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Intelligent Robotics and Machine Vision Lab 4 Jul 19, 2022
RMTD: Robust Moving Target Defence Against False Data Injection Attacks in Power Grids

RMTD: Robust Moving Target Defence Against False Data Injection Attacks in Power Grids Real-time detection performance. This repo contains the code an

0 Nov 10, 2021
Spherical Confidence Learning for Face Recognition, accepted to CVPR2021.

Sphere Confidence Face (SCF) This repository contains the PyTorch implementation of Sphere Confidence Face (SCF) proposed in the CVPR2021 paper: Shen

Maths 70 Dec 09, 2022
Official Repository for Machine Learning class - Physics Without Frontiers 2021

PWF 2021 Física Sin Fronteras es un proyecto del Centro Internacional de Física Teórica (ICTP) en Trieste Italia. El ICTP es un centro dedicado a fome

36 Aug 06, 2022
Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

Neelesh C A 3 Oct 14, 2022
Navigating StyleGAN2 w latent space using CLIP

Navigating StyleGAN2 w latent space using CLIP an attempt to build sth with the official SG2-ADA Pytorch impl kinda inspired by Generating Images from

Mike K. 55 Dec 06, 2022
QHack—the quantum machine learning hackathon

Official repo for QHack—the quantum machine learning hackathon

Xanadu 72 Dec 21, 2022