ScaleNet: A Shallow Architecture for Scale Estimation

Related tags

Deep LearningScaleNet
Overview

ScaleNet: A Shallow Architecture for Scale Estimation

Repository for the code of ScaleNet paper:

"ScaleNet: A Shallow Architecture for Scale Estimation".
Axel Barroso-Laguna, Yurun Tian, and Krystian Mikolajczyk. arxiv 2021.

[Paper on arxiv]

Prerequisite

Python 3.7 is required for running and training ScaleNet code. Use Conda to install the dependencies:

conda create --name scalenet_env
conda activate scalenet_env 
conda install pytorch==1.2.0 -c pytorch
conda install -c conda-forge tensorboardx opencv tqdm 
conda install -c anaconda pandas 
conda install -c pytorch torchvision 

Scale estimation

run_scalenet.py can be used to estimate the scale factor between two input images. We provide as an example two images, im1.jpg and im2.jpg, within the assets/im_test folder as an example. For a quick test, please run:

python run_scalenet.py --im1_path assets/im_test/im1.jpg --im2_path assets/im_test/im2.jpg

Arguments:

  • im1_path: Path to image A.
  • im2_path: Path to image B.

It returns the scale factor A->B.

Training ScaleNet

We provide a list of Megadepth image pairs and scale factors in the assets folder. We use the undistorted images, corresponding camera intrinsics, and extrinsics preprocessed by D2-Net. You can download them directly from their main repository. If you desire to use the default configuration for training, just run the following line:

python train_ScaleNet.py --image_data_path /path/to/megadepth_d2net

There are though some important arguments to take into account when training ScaleNet.

Arguments:

  • image_data_path: Path to the undistorted Megadepth images from D2-Net.
  • save_processed_im: ScaleNet processes the images so that they are center-cropped and resized to a default resolution. We give the option to store the processed images and load them during training, which results in a much faster training. However, the size of the files can be big, and hence, we suggest storing them in a large storage disk. Default: True.
  • root_precomputed_files: Path to save the processed image pairs.

If you desire to modify ScaleNet training or architecture, look for all the arguments in the train_ScaleNet.py script.

Test ScaleNet - camera pose

In addition to the training, we also provide a template for testing ScaleNet in the camera pose task. In assets/data/test.csv, you can find the test Megadepth pairs, along with their scale change as well as their camera poses.

Run the following command to test ScaleNet + SIFT in our custom camera pose split:

python test_camera_pose.py --image_data_path /path/to/megadepth_d2net

camera_pose.py script is intended to provide a structure of our camera pose experiment. You can change either the local feature extractor or the scale estimator and obtain your camera pose results.

BibTeX

If you use this code or the provided training/testing pairs in your research, please cite our paper:

@InProceedings{Barroso-Laguna2021_scale,
    author = {Barroso-Laguna, Axel and Tian, Yurun and Mikolajczyk, Krystian},
    title = {{ScaleNet: A Shallow Architecture for Scale Estimation}},
    booktitle = {Arxiv: },
    year = {2021},
}
Owner
Axel Barroso
Computer Vision PhD Student
Axel Barroso
Chatbot in 200 lines of code using TensorLayer

Seq2Seq Chatbot This is a 200 lines implementation of Twitter/Cornell-Movie Chatbot, please read the following references before you read the code: Pr

TensorLayer Community 820 Dec 17, 2022
Bringing Characters to Life with Computer Brains in Unity

AI4Animation: Deep Learning for Character Control This project explores the opportunities of deep learning for character animation and control as part

Sebastian Starke 5.5k Jan 04, 2023
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

32 Jun 14, 2022
External Attention Network

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks paper : https://arxiv.org/abs/2105.02358 EAMLP will come soon Jitto

MenghaoGuo 357 Dec 11, 2022
Semantic Segmentation Architectures Implemented in PyTorch

pytorch-semseg Semantic Segmentation Algorithms Implemented in PyTorch This repository aims at mirroring popular semantic segmentation architectures i

Meet Shah 3.3k Dec 29, 2022
Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking

Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking Part-Aware Measurement for Robust Multi-View Multi-Human 3D P

19 Oct 27, 2022
Projecting interval uncertainty through the discrete Fourier transform

Projecting interval uncertainty through the discrete Fourier transform This repo

1 Mar 02, 2022
SoK: Vehicle Orientation Representations for Deep Rotation Estimation

SoK: Vehicle Orientation Representations for Deep Rotation Estimation Raymond H. Tu, Siyuan Peng, Valdimir Leung, Richard Gao, Jerry Lan This is the o

FIRE Capital One Machine Learning of the University of Maryland 12 Oct 07, 2022
arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Andrej 671 Dec 31, 2022
Solutions of Reinforcement Learning 2nd Edition

Solutions of Reinforcement Learning, An Introduction

YIFAN WANG 1.4k Dec 30, 2022
LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

Tixiao Shan 1.1k Dec 27, 2022
The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis.

deep-learning-LAM-avulsion-diagnosis The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis

1 Jan 12, 2022
Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Adaptively Aligned Image Captioning via Adaptive Attention Time This repository includes the implementation for Adaptively Aligned Image Captioning vi

Lun Huang 45 Aug 27, 2022
VLGrammar: Grounded Grammar Induction of Vision and Language

VLGrammar: Grounded Grammar Induction of Vision and Language

Yining Hong 27 Dec 23, 2022
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition How Fast Compare to Other Zero-Shot NAS Proxies on CIFAR-10/100 Pre-trained Model

190 Dec 29, 2022
Text Extraction Formulation + Feedback Loop for state-of-the-art WSD (EMNLP 2021)

ConSeC is a novel approach to Word Sense Disambiguation (WSD), accepted at EMNLP 2021. It frames WSD as a text extraction task and features a feedback loop strategy that allows the disambiguation of

Sapienza NLP group 36 Dec 13, 2022
Lucid Sonic Dreams syncs GAN-generated visuals to music.

Lucid Sonic Dreams Lucid Sonic Dreams syncs GAN-generated visuals to music. By default, it uses NVLabs StyleGAN2, with pre-trained models lifted from

731 Jan 02, 2023
Human Dynamics from Monocular Video with Dynamic Camera Movements

Human Dynamics from Monocular Video with Dynamic Camera Movements Ri Yu, Hwangpil Park and Jehee Lee Seoul National University ACM Transactions on Gra

215 Jan 01, 2023
Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation By Qiang Zhou*, Zilong Huang*, Lichao Huang, Han Shen, Yon

Forest 117 Apr 01, 2022
TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition

TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition Xue, Wenyuan, et al. "TGRNet: A Table Graph Reconstruction Network for Ta

Wenyuan 68 Jan 04, 2023