Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Overview

CenterPose

Overview

This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image" by Lin et al. (full citation below). In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation, which operates on unknown object instances within a known category using a single RGB image input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative 3D bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, a single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark of real images, outperforming state-of-the-art methods for 3D IoU metric (27.6% higher than the single-stage approach of MobilePose and 7.1% higher than the related two-stage approach). The algorithm runs at 15 fps on an NVIDIA GTX 1080Ti GPU.

Installation

The code was tested on Ubuntu 16.04, with Anaconda Python 3.6 and PyTorch 1.1.0. Higher versions should be possible with some accuracy difference. NVIDIA GPUs are needed for both training and testing.

  1. Clone this repo:

    CenterPose_ROOT=/path/to/clone/CenterPose
    git clone https://github.com/NVlabs/CenterPose.git $CenterPose_ROOT
    
  2. Create an Anaconda environment or create your own virtual environment

    conda create -n CenterPose python=3.6
    conda activate CenterPose
    pip install -r requirements.txt
    conda install -c conda-forge eigenpy
    
  3. Compile the deformable convolutional layer

    git submodule init
    git submodule update
    cd $CenterPose_ROOT/src/lib/models/networks/DCNv2
    ./make.sh
    

    [Optional] If you want to use a higher version of PyTorch, you need to download the latest version of DCNv2 and compile the library.

    git submodule set-url https://github.com/jinfagang/DCNv2_latest.git src/lib/models/networks/DCNv2
    git submodule sync
    git submodule update --init --recursive --remote
    cd $CenterPose_ROOT/src/lib/models/networks/DCNv2
    ./make.sh
    
  4. Download our pre-trained models for CenterPose and move all the .pth files to $CenterPose_ROOT/models/CenterPose/. We currently provide models for 9 categories: bike, book, bottle, camera, cereal_box, chair, cup, laptop, and shoe.

  5. Prepare training/testing data

    We save all the training/testing data under $CenterPose_ROOT/data/.

    For the Objectron dataset, we created our own data pre-processor to extract the data for training/testing. Refer to the data directory for more details.

Demo

We provide supporting demos for image, videos, webcam, and image folders. See $CenterPose_ROOT/images/CenterPose

For category-level 6-DoF object estimation on images/video/image folders, run:

cd $CenterPose_ROOT/src
python demo.py --demo /path/to/image/or/folder/or/video --arch dlav1_34 --load_model ../path/to/model

You can also enable --debug 4 to save all the intermediate and final outputs.

For the webcam demo (You may want to specify the camera intrinsics via --cam_intrinsic), run

cd $CenterPose_ROOT/src
python demo.py --demo webcam --arch dlav1_34 --load_model ../path/to/model

Training

We follow the approach of CenterNet for training the DLA network, reducing the learning rate by 10x after epoch 90 and 120, and stopping after 140 epochs.

For debug purposes, you can put all the local training params in the $CenterPose_ROOT/src/main_CenterPose.py script. You can also use the command line instead. More options are in $CenterPose_ROOT/src/lib/opts.py.

To start a new training job, simply do the following, which will use default parameter settings:

cd $CenterPose_ROOT/src
python main_CenterPose.py

The result will be saved in $CenterPose_ROOT/exp/object_pose/$dataset_$category_$arch_$time ,e.g., objectron_bike_dlav1_34_2021-02-27-15-33

You could then use tensorboard to visualize the training process via

cd $path/to/folder
tensorboard --logdir=logs --host=XX.XX.XX.XX

Evaluation

We evaluate our method on the Objectron dataset, please refer to the objectron_eval directory for more details.

Citation

Please cite grasp_primitiveShape if you use this repository in your publications:

@article{lin2021single,
  title={Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image},
  author={Lin, Yunzhi and Tremblay, Jonathan and Tyree, Stephen and Vela, Patricio A and Birchfield, Stan},
  journal={arXiv preprint arXiv:2109.06161},
  year={2021}
}

Licence

CenterPose is licensed under the NVIDIA Source Code License - Non-commercial.

Owner
NVIDIA Research Projects
NVIDIA Research Projects
DeceFL: A Principled Decentralized Federated Learning Framework

DeceFL: A Principled Decentralized Federated Learning Framework This repository comprises codes that reproduce experiments in Ye, et al (2021), which

Huazhong Artificial Intelligence Lab (HAIL) 10 May 31, 2022
Face and other object detection using OpenCV and ML Yolo

Object-and-Face-Detection-Using-Yolo- Opencv and YOLO object and face detection is implemented. You only look once (YOLO) is a state-of-the-art, real-

Happy N. Monday 3 Feb 15, 2022
A working implementation of the Categorical DQN (Distributional RL).

Categorical DQN. Implementation of the Categorical DQN as described in A distributional Perspective on Reinforcement Learning. Thanks to @tudor-berari

Florin Gogianu 98 Sep 20, 2022
A simple python program that can be used to implement user authentication tokens into your program...

token-generator A simple python module that can be used by developers to implement user authentication tokens into your program... code examples creat

octo 6 Apr 18, 2022
Vehicle detection using machine learning and computer vision techniques for Udacity's Self-Driving Car Engineer Nanodegree.

Vehicle Detection Video demo Overview Vehicle detection using these machine learning and computer vision techniques. Linear SVM HOG(Histogram of Orien

hata 1.1k Dec 18, 2022
End-To-End Optimization of LiDAR Beam Configuration

End-To-End Optimization of LiDAR Beam Configuration arXiv | IEEE Xplore This repository is the official implementation of the paper: End-To-End Optimi

Niclas 30 Nov 28, 2022
Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

Kemal Oksuz 46 Sep 29, 2022
Ejemplo Algoritmo Viterbi - Example of a Viterbi algorithm applied to a hidden Markov model on DNA sequence

Ejemplo Algoritmo Viterbi Ejemplo de un algoritmo Viterbi aplicado a modelo ocul

Mateo Velásquez Molina 1 Jan 10, 2022
pytorch implementation for PointNet

PointNet.pytorch This repo is implementation for PointNet in pytorch. The model is in pointnet/model.py. It is teste

Fei Xia 1.7k Dec 30, 2022
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Baleen Baleen is a state-of-the-art model for multi-hop reasoning, enabling scalable multi-hop search over massive collections for knowledge-intensive

Stanford Future Data Systems 22 Dec 05, 2022
TransferNet: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network

TransferNet: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network Created by Seunghoon Hong, Junhyuk Oh,

42 Jun 29, 2022
Deep ViT Features as Dense Visual Descriptors

dino-vit-features [paper] [project page] Official implementation of the paper "Deep ViT Features as Dense Visual Descriptors". We demonstrate the effe

Shir Amir 113 Dec 24, 2022
Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

zshicode 1 Nov 18, 2021
A Python parser that takes the content of a text file and then reads it into variables.

Text-File-Parser A Python parser that takes the content of a text file and then reads into variables. Input.text File 1. What is your ***? 1. 18 -

Kelvin 0 Jul 26, 2021
Addition of pseudotorsion caclulation eta, theta, eta', and theta' to barnaba package

Addition to Original Barnaba Code: This is modified version of Barnaba package to calculate RNA pseudotorsion angles eta, theta, eta', and theta'. Ple

Mandar Kulkarni 1 Jan 11, 2022
1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

Lihe Yang 209 Jan 01, 2023
Get a Grip! - A robotic system for remote clinical environments.

Get a Grip! Within clinical environments, sterilization is an essential procedure for disinfecting surgical and medical instruments. For our engineeri

Jay Sharma 1 Jan 05, 2022
Exe-to-xlsm - Simple script to create VBscript of exe and inject to xlsm

🎁 Exe To Office Executable file injection to Office documents: .xlsm, .docm, .p

3 Jan 25, 2022
A map update dataset and benchmark

MUNO21 MUNO21 is a dataset and benchmark for machine learning methods that automatically update and maintain digital street map datasets. Previous dat

16 Nov 30, 2022
Official code for 'Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urban Driving Scenes'

PEBAL This repo contains the Pytorch implementation of our paper: Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urba

Yu Tian 115 Dec 29, 2022