Pytorch version of SfmLearner from Tinghui Zhou et al.

Overview

SfMLearner Pytorch version

This codebase implements the system described in the paper:

Unsupervised Learning of Depth and Ego-Motion from Video

Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe

In CVPR 2017 (Oral).

See the project webpage for more details.

Original Author : Tinghui Zhou ([email protected]) Pytorch implementation : Clément Pinard ([email protected])

sample_results

Preamble

This codebase was developed and tested with Pytorch 1.0.1, CUDA 10 and Ubuntu 16.04. Original code was developped in tensorflow, you can access it here

Prerequisite

pip3 install -r requirements.txt

or install manually the following packages :

pytorch >= 1.0.1
pebble
matplotlib
imageio
scipy
argparse
tensorboardX
blessings
progressbar2
path.py

Note

Because it uses latests pytorch features, it is not compatible with anterior versions of pytorch.

If you don't have an up to date pytorch, the tags can help you checkout the right commits corresponding to your pytorch version.

What has been done

  • Training has been tested on KITTI and CityScapes.
  • Dataset preparation has been largely improved, and now stores image sequences in folders, making sure that movement is each time big enough between each frame
  • That way, training is now significantly faster, running at ~0.14sec per step vs ~0.2s per steps initially (on a single GTX980Ti)
  • In addition you don't need to prepare data for a particular sequence length anymore as stacking is made on the fly.
  • You can still choose the former stacked frames dataset format.
  • Convergence is now almost as good as original paper with same hyper parameters
  • You can know compare with groud truth for your validation set. It is still possible to validate without, but you now can see that minimizing photometric error is not equivalent to optimizing depth map.

Differences with official Implementation

  • Smooth Loss is different from official repo. Instead of applying it to disparity, we apply it to depth. Original disparity smooth loss did not work well (don't know why !) and it did not even converge at all with weight values used (0.5).
  • loss is divided by 2.3 when downscaling instead of 2. This is the results of empiric experiments, so the optimal value is clearly not carefully determined.
  • As a consequence, with a smooth loss of 2.0̀, depth test is better, but Pose test is worse. To revert smooth loss back to original, you can change it here

Preparing training data

Preparation is roughly the same command as in the original code.

For KITTI, first download the dataset using this script provided on the official website, and then run the following command. The --with-depth option will save resized copies of groundtruth to help you setting hyper parameters. The --with-pose will dump the sequence pose in the same format as Odometry dataset (see pose evaluation)

python3 data/prepare_train_data.py /path/to/raw/kitti/dataset/ --dataset-format 'kitti' --dump-root /path/to/resulting/formatted/data/ --width 416 --height 128 --num-threads 4 [--static-frames /path/to/static_frames.txt] [--with-depth] [--with-pose]

For Cityscapes, download the following packages: 1) leftImg8bit_sequence_trainvaltest.zip, 2) camera_trainvaltest.zip. You will probably need to contact the administrators to be able to get it. Then run the following command

python3 data/prepare_train_data.py /path/to/cityscapes/dataset/ --dataset-format 'cityscapes' --dump-root /path/to/resulting/formatted/data/ --width 416 --height 171 --num-threads 4

Notice that for Cityscapes the img_height is set to 171 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 128.

Training

Once the data are formatted following the above instructions, you should be able to train the model by running the following command

python3 train.py /path/to/the/formatted/data/ -b4 -m0.2 -s0.1 --epoch-size 3000 --sequence-length 3 --log-output [--with-gt]

You can then start a tensorboard session in this folder by

tensorboard --logdir=checkpoints/

and visualize the training progress by opening https://localhost:6006 on your browser. If everything is set up properly, you should start seeing reasonable depth prediction after ~30K iterations when training on KITTI.

Evaluation

Disparity map generation can be done with run_inference.py

python3 run_inference.py --pretrained /path/to/dispnet --dataset-dir /path/pictures/dir --output-dir /path/to/output/dir

Will run inference on all pictures inside dataset-dir and save a jpg of disparity (or depth) to output-dir for each one see script help (-h) for more options.

Disparity evaluation is avalaible

python3 test_disp.py --pretrained-dispnet /path/to/dispnet --pretrained-posenet /path/to/posenet --dataset-dir /path/to/KITTI_raw --dataset-list /path/to/test_files_list

Test file list is available in kitti eval folder. To get fair comparison with Original paper evaluation code, don't specify a posenet. However, if you do, it will be used to solve the scale factor ambiguity, the only ground truth used to get it will be vehicle speed which is far more acceptable for real conditions quality measurement, but you will obviously get worse results.

Pose evaluation is also available on Odometry dataset. Be sure to download both color images and pose !

python3 test_pose.py /path/to/posenet --dataset-dir /path/to/KITIT_odometry --sequences [09]

ATE (Absolute Trajectory Error) is computed as long as RE for rotation (Rotation Error). RE between R1 and R2 is defined as the angle of R1*R2^-1 when converted to axis/angle. It corresponds to RE = arccos( (trace(R1 @ R2^-1) - 1) / 2). While ATE is often said to be enough to trajectory estimation, RE seems important here as sequences are only seq_length frames long.

Pretrained Nets

Avalaible here

Arguments used :

python3 train.py /path/to/the/formatted/data/ -b4 -m0 -s2.0 --epoch-size 1000 --sequence-length 5 --log-output --with-gt

Depth Results

Abs Rel Sq Rel RMSE RMSE(log) Acc.1 Acc.2 Acc.3
0.181 1.341 6.236 0.262 0.733 0.901 0.964

Pose Results

5-frames snippets used

Seq. 09 Seq. 10
ATE 0.0179 (std. 0.0110) 0.0141 (std. 0.0115)
RE 0.0018 (std. 0.0009) 0.0018 (std. 0.0011)

Discussion

Here I try to link the issues that I think raised interesting questions about scale factor, pose inference, and training hyperparameters

  • Issue 48 : Why is target frame at the center of the sequence ?
  • Issue 39 : Getting pose vector without the scale factor uncertainty
  • Issue 46 : Is Interpolated groundtruth better than sparse groundtruth ?
  • Issue 45 : How come the inverse warp is absolute and pose and depth are only relative ?
  • Issue 32 : Discussion about validation set, and optimal batch size
  • Issue 25 : Why filter out static frames ?
  • Issue 24 : Filtering pixels out of the photometric loss
  • Issue 60 : Inverse warp is only one way !

Other Implementations

TensorFlow by tinghuiz (original code, and paper author)

Owner
Clément Pinard
PhD ENSTA Paris, Deep Learning Engineer @ ContentSquare
Clément Pinard
Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

Simple_Linear_2nd_ODE_Solver_GUI Description It is a 2nd constant coefficient li

:) 4 Feb 05, 2022
Reproduces the results of the paper "Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations".

Finite basis physics-informed neural networks (FBPINNs) This repository reproduces the results of the paper Finite Basis Physics-Informed Neural Netwo

Ben Moseley 65 Dec 28, 2022
Neural HMMs are all you need (for high-quality attention-free TTS)

Neural HMMs are all you need (for high-quality attention-free TTS) Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter This is the official

Shivam Mehta 0 Oct 28, 2022
Official implementation of Monocular Quasi-Dense 3D Object Tracking

Monocular Quasi-Dense 3D Object Tracking Monocular Quasi-Dense 3D Object Tracking (QD-3DT) is an online framework detects and tracks objects in 3D usi

Visual Intelligence and Systems Group 441 Dec 20, 2022
Implementation of "Deep Implicit Templates for 3D Shape Representation"

Deep Implicit Templates for 3D Shape Representation Zerong Zheng, Tao Yu, Qionghai Dai, Yebin Liu. arXiv 2020. This repository is an implementation fo

Zerong Zheng 144 Dec 07, 2022
Churn-Prediction-Project - In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class.

Churn-Prediction-Project In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class. Project in

1 Jan 03, 2022
CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K

CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K Our dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

96 Jul 05, 2022
Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

pix2pix-keras Pix2pix implementation in keras. Original paper: Image-to-Image Translation with Conditional Adversarial Networks (pix2pix) Paper Author

William Falcon 141 Dec 30, 2022
Relative Positional Encoding for Transformers with Linear Complexity

Stochastic Positional Encoding (SPE) This is the source code repository for the ICML 2021 paper Relative Positional Encoding for Transformers with Lin

Antoine Liutkus 48 Nov 16, 2022
Context Axial Reverse Attention Network for Small Medical Objects Segmentation

CaraNet: Context Axial Reverse Attention Network for Small Medical Objects Segmentation This repository contains the implementation of a novel attenti

401 Dec 23, 2022
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 62 Dec 14, 2022
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

Yifan Zhang 259 Dec 25, 2022
Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

Meta Research 122 Dec 13, 2022
✨✨✨An awesome open source toolbox for stereo matching.

OpenStereo This is an awesome open source toolbox for stereo matching. Supported Methods: BM SGM(T-PAMI'07) GCNet(ICCV'17) PSMNet(CVPR'18) StereoNet(E

Wang Qingyu 6 Nov 04, 2022
Tutorial on scikit-learn and IPython for parallel machine learning

Parallel Machine Learning with scikit-learn and IPython Video recording of this tutorial given at PyCon in 2013. The tutorial material has been rearra

Olivier Grisel 1.6k Dec 26, 2022
MetaDrive: Composing Diverse Scenarios for Generalizable Reinforcement Learning

MetaDrive: Composing Diverse Driving Scenarios for Generalizable RL [ Documentation | Demo Video ] MetaDrive is a driving simulator with the following

DeciForce: Crossroads of Machine Perception and Autonomy 276 Jan 04, 2023
Multi-scale discriminator feature-wise loss function

Multi-Scale Discriminative Feature Loss This repository provides code for Multi-Scale Discriminative Feature (MDF) loss for image reconstruction algor

Graphics and Displays group - University of Cambridge 76 Dec 12, 2022
Awesome Remote Sensing Toolkit based on PaddlePaddle.

基于飞桨框架开发的高性能遥感图像处理开发套件,端到端地完成从训练到部署的全流程遥感深度学习应用。 最新动态 PaddleRS 即将发布alpha版本!欢迎大家试用 简介 PaddleRS是遥感科研院所、相关高校共同基于飞桨开发的遥感处理平台,支持遥感图像分类,目标检测,图像分割,以及变化检测等常用遥

146 Dec 11, 2022
Interactive dimensionality reduction for large datasets

BlosSOM 🌼 BlosSOM is a graphical environment for running semi-supervised dimensionality reduction with EmbedSOM. You can use it to explore multidimen

19 Dec 14, 2022
Self-Supervised Pillar Motion Learning for Autonomous Driving (CVPR 2021)

Self-Supervised Pillar Motion Learning for Autonomous Driving Chenxu Luo, Xiaodong Yang, Alan Yuille Self-Supervised Pillar Motion Learning for Autono

QCraft 101 Dec 05, 2022