[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Last update: Dec 30, 2022

Related tags

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

This is the official implementation for the method described in

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Jiaxing Yan, Hong Zhao, Penghui Bu and YuSheng Jin.

3DV 2021 (arXiv pdf)

Setup

Assuming a fresh Anaconda distribution, you can install the dependencies with:

conda install pytorch=1.7.0 torchvision=0.8.1 -c pytorch
pip install tensorboardX==2.1
pip install opencv-python==3.4.7.28
pip install albumentations==0.5.2   # we use albumentations for faster image preprocessing

This project uses Python 3.7.8, cuda 11.4, the experiments were conducted using a single NVIDIA RTX 3090 GPU and CPU environment - Intel Core i9-9900KF.

We recommend using a conda environment to avoid dependency conflicts.

Prediction for a single image

You can predict scaled disparity for a single image with:

python test_simple.py --image_path images/test_image.jpg --model_name MS_1024x320

On its first run either of these commands will download the MS_1024x320 pretrained model (272MB) into the models/ folder. We provide the following options for --model_name:

`--model_name`	Training modality	Resolution	Abs_Rel	Sq_Rel	$\delta<1.25$
`M_640x192`	Mono	640 x 192	0.105	0.769	0.892
`M_1024x320`	Mono	1024 x 320	0.102	0.734	0.898
`M_1280x384`	Mono	1280 x 384	0.102	0.715	0.900
`MS_640x192`	Mono + Stereo	640 x 192	0.102	0.752	0.894
`MS_1024x320`	Mono + Stereo	1024 x 320	0.096	0.694	0.908

KITTI training data

You can download the entire raw KITTI dataset by running:

wget -i splits/kitti_archives_to_download.txt -P kitti_data/

Then unzip with

cd kitti_data
unzip "*.zip"
cd ..

Splits

The train/test/validation splits are defined in the splits/ folder. By default, the code will train a depth model using Zhou's subset of the standard Eigen split of KITTI, which is designed for monocular training. You can also train a model using the new benchmark split or the odometry split by setting the --split flag.

Training

Monocular training:

python train.py --model_name mono_model

Stereo training:

Our code defaults to using Zhou's subsampled Eigen training data. For stereo-only training we have to specify that we want to use the full Eigen training set.

python train.py --model_name stereo_model \
  --frame_ids 0 --use_stereo --split eigen_full

Monocular + stereo training:

python train.py --model_name mono+stereo_model \
  --frame_ids 0 -1 1 --use_stereo

Note: For high resolution input, e.g. 1024x320 and 1280x384, we employ a lightweight setup, ResNet18 and 640x192, for pose encoder at training for memory savings. The following example command trains a model named M_1024x320:

python train.py --model_name M_1024x320 --num_layers 50 --height 320 --width 1024 --num_layers_pose 18 --height_pose 192 --width_pose 640
#             encoder     resolution                                     
# DepthNet   resnet50      1024x320
# PoseNet    resnet18       640x192

Finetuning a pretrained model

Add the following to the training command to load an existing model for finetuning:

python train.py --model_name finetuned_mono --load_weights_folder ~/tmp/mono_model/models/weights_19

Other training options

Run python train.py -h (or look at options.py) to see the range of other training options, such as learning rates and ablation settings.

KITTI evaluation

To prepare the ground truth depth maps run:

python export_gt_depth.py --data_path kitti_data --split eigen
python export_gt_depth.py --data_path kitti_data --split eigen_benchmark

...assuming that you have placed the KITTI dataset in the default location of ./kitti_data/.

The following example command evaluates the weights of a model named MS_1024x320:

python evaluate_depth.py --load_weights_folder ./log/MS_1024x320 --eval_mono --data_path ./kitti_data --eval_split eigen

Precomputed results

You can download our precomputed disparity predictions from the following links:

Training modality	Input size	`.npy` filesize	Eigen disparities
Mono	640 x 192	326M	Download 🔗
Mono	1024 x 320	871M	Download 🔗
Mono	1280 x 384	1.27G	Download 🔗
Mono + Stereo	640 x 192	326M	Download 🔗
Mono + Stereo	1024 x 320	871M	Download 🔗

References

Monodepth2 - https://github.com/nianticlabs/monodepth2

[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Related tags

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Setup

Prediction for a single image

KITTI training data

Training

Finetuning a pretrained model

Other training options

KITTI evaluation

Precomputed results

References

Owner

Jiaxing Yan

Bunch of different tools which helps visualizing and annotating images for semantic/instance segmentation tasks

Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

PyTorch implementation of Pointnet2/Pointnet++

[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

Computations and statistics on manifolds with geometric structures.

ARAE-Tensorflow for Discrete Sequences (Adversarially Regularized Autoencoder)

This repository contains implementations of all Machine Learning Algorithms from scratch in Python. Mathematics required for ML and many projects have also been included.

A PyTorch Reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution

PyTorch code for DriveGAN: Towards a Controllable High-Quality Neural Simulation

Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

Action Segmentation Evaluation

[AI6122] Text Data Management & Processing

Distributed Deep learning with Keras & Spark

A collection of resources on GAN Inversion.

Pytorch implementation of the paper Time-series Generative Adversarial Networks

The code of "Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer".

tensorflow implementation of 'YOLO : Real-Time Object Detection'

PyTorch Implementation of ECCV 2020 Spotlight TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

N-Omniglot is a large neuromorphic few-shot learning dataset

NeoPlay is the project dedicated to ESport events.