Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Last update: Dec 30, 2022

Related tags

Deep Learning Neural-Scene-Flow-Fields

Overview

Neural Scene Flow Fields

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

[Project Website] [Paper] [Video]

Dependency

The code is tested with Python3, Pytorch >= 1.6 and CUDA >= 10.2, the dependencies includes

configargparse
matplotlib
opencv
scikit-image
scipy
cupy
imageio.
tqdm
kornia

Video preprocessing

Download nerf_data.zip from link, an example input video with SfM camera poses and intrinsics estimated from COLMAP (Note you need to use COLMAP "colmap image_undistorter" command to undistort input images to get "dense" folder as shown in the example, this dense folder should include "images" and "sparse" folders).
Download single view depth prediction model "model.pt" from link, and put it on the folder "nsff_scripts".
Run the following commands to generate required inputs for training/inference:

    # Usage
    cd nsff_scripts
    # create camera intrinsics/extrinsic format for NSFF, same as original NeRF where it uses imgs2poses.py script from the LLFF code: https://github.com/Fyusion/LLFF/blob/master/imgs2poses.py
    python save_poses_nerf.py --data_path "/home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/"
    # Resize input images and run single view model, 
    # argument resize_height: resized image height for model training, width will be resized based on original aspect ratio
    python run_midas.py --data_path "/home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/" --resize_height 288
    # Run optical flow model
    ./download_models.sh
    python run_flows_video.py --model models/raft-things.pth --data_path /home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/

Rendering from an example pretrained model

Download pretraind model "kid-running_ndc_5f_sv_of_sm_unify3_F00-30.zip" from link. Unzipping and putting it in the folder "nsff_exp/logs/kid-running_ndc_5f_sv_of_sm_unify3_F00-30/360000.tar".

Set datadir in config/config_kid-running.txt to the root directory of input video. Then go to directory "nsff_exp":

   cd nsff_exp
   mkdir logs

Rendering of fixed time, viewpoint interpolation

   python run_nerf.py --config configs/config_kid-running.txt --render_bt --target_idx 10

By running the example command, you should get the following result:

Rendering of fixed viewpoint, time interpolation

   python run_nerf.py --config configs/config_kid-running.txt --render_lockcam_slowmo --target_idx 8

By running the example command, you should get the following result:

Rendering of space-time interpolation

   python run_nerf.py --config configs/config_kid-running.txt --render_slowmo_bt  --target_idx 10

By running the example command, you should get the following result:

Training

In configs/config_kid-running.txt, modifying expname to any name you like (different from the original one), and running the following command to train the model:

    python run_nerf.py --config configs/config_kid-running.txt

The per-scene training takes ~2 days using 4 Nvidia GTX2080TI GPUs.

Several parameters in config files you might need to know for training a good model on in-the-wild video

final_height: this must be same as --resize_height argument in run_midas.py, in kid-running case, it should be 288.
N_samples: in order to render images with higher resolution, you have to increase number sampled points such as 256 or 512
chain_sf: model will perform local 5 frame consistency if set True, and perform 3 frame consistency if set False. For faster training, setting to False.
start_frame, end_frame: indicate training frame range. The default model usually works for video of 1~2s and 30-60 frames work the best for default hyperparameters. Training on longer frames can cause oversmooth rendering. To mitigate the effect, you can increase the capacity of the network by increasing netwidth to 512.
decay_iteration: number of iteartion in initialization stage. Data-driven losses will decay every 1000 * decay_iteration steps. We have updated code to automatically calculate number of decay iterations.
no_ndc: our current implementation only supports reconstruction in NDC space, meaning it only works for forward-facing scene, same as original NeRF.
use_motion_mask, num_extra_sample: whether to use estimated coarse motion segmentation mask to perform hard-mining sampling during initialization stage, and how many extra samples during initialization stage.
w_depth, w_optical_flow: weight of losses for single-view depth and geometry consistency priors described in the paper. Weights of (0.4, 0.2) or (0.2, 0.1) usually work the best for most of the videos.
If you see signifacnt ghosting result in the final rendering, you might try the suggestion from link

Evaluation on the Dynamic Scene Dataset

Download Dynamic Scene dataset "dynamic_scene_data_full.zip" from link
Download pretrained model "dynamic_scene_pretrained_models.zip" from link, unzip and put them in the folder "nsff_exp/logs/"
Run the following command for each scene to get quantitative results reported in the paper:

   # Usage: configs/config_xxx.txt indicates each scene name such as config_balloon1-2.txt in nsff/configs
   python evaluation.py --config configs/config_xxx.txt

Note: you have to use modified LPIPS implementation included in this branch in order to measure LIPIS error for dynamic region only as described in the paper.

Acknowledgment

The code is based on implementation of several prior work:

License

This repository is released under the MIT license.

Citation

If you find our code/models useful, please consider citing our paper:

@InProceedings{li2020neural,
  title={Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes},
  author={Li, Zhengqi and Niklaus, Simon and Snavely, Noah and Wang, Oliver},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Related tags

Overview

Neural Scene Flow Fields

Dependency

Video preprocessing

Rendering from an example pretrained model

Training

Evaluation on the Dynamic Scene Dataset

Acknowledgment

License

Citation

Owner

Zhengqi Li

No Code AI/ML platform

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Implementation of Feedback Transformer in Pytorch

MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

ObjDetApp deploys a pytorch model for object detection

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

Weakly Supervised End-to-End Learning (NeurIPS 2021)

Pyramid Scene Parsing Network, CVPR2017.

Machine Learning Toolkit for Kubernetes

Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

The official repository for BaMBNet

A Fast Monotone Rotating Shallow Water model

Transparent Transformer Segmentation

[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning

EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Deep Watershed Transform for Instance Segmentation

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Related tags

Overview

Neural Scene Flow Fields

Dependency

Video preprocessing

Rendering from an example pretrained model

Training

Evaluation on the Dynamic Scene Dataset

Acknowledgment

License

Citation

Owner

Zhengqi Li

No Code AI/ML platform

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Implementation of Feedback Transformer in Pytorch

MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

*ObjDetApp* deploys a pytorch model for object detection

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

Weakly Supervised End-to-End Learning (NeurIPS 2021)

Pyramid Scene Parsing Network, CVPR2017.

Machine Learning Toolkit for Kubernetes

Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

The official repository for BaMBNet

A Fast Monotone Rotating Shallow Water model

Transparent Transformer Segmentation

[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning

EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Deep Watershed Transform for Instance Segmentation

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

ObjDetApp deploys a pytorch model for object detection