This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Related tags

Deep LearningDEKR
Overview

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

Introduction

In this paper, we are interested in the bottom-up paradigm of estimating human poses from an image. We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework. Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions.

We present a simple yet effective approach, named disentangled keypoint regression (DEKR). We adopt adaptive convolutions through pixel-wise spatial transformer to activate the pixels in the keypoint regions and accordingly learn representations from them. We use a multi-branch structure for separate regression: each branch learns a representation with dedicated adaptive convolutions and regresses one keypoint. The resulting disentangled representations are able to attend to the keypoint regions, respectively, and thus the keypoint regression is spatially more accurate. We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods and achieves superior bottom-up pose estimation results on two benchmark datasets, COCO and CrowdPose.

Main Results

Results on COCO val2017 without multi-scale test

Backbone Input size #Params GFLOPs AP AP .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_hrnet_w32 512x512 29.6M 45.4 0.680 0.867 0.745 0.621 0.777 0.730 0.898 0.784 0.662 0.827
pose_hrnet_w48 640x640 65.7M 141.5 0.710 0.883 0.774 0.667 0.785 0.760 0.914 0.815 0.706 0.840

Results on COCO val2017 with multi-scale test

Backbone Input size #Params GFLOPs AP AP .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_hrnet_w32 512x512 29.6M 45.4 0.707 0.877 0.771 0.662 0.778 0.759 0.913 0.813 0.705 0.836
pose_hrnet_w48 640x640 65.7M 141.5 0.723 0.883 0.786 0.686 0.786 0.777 0.924 0.832 0.728 0.849

Results on COCO test-dev2017 without multi-scale test

Backbone Input size #Params GFLOPs AP AP .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_hrnet_w32 512x512 29.6M 45.4 0.673 0.879 0.741 0.615 0.761 0.724 0.908 0.782 0.654 0.819
pose_hrnet_w48 640x640 65.7M 141.5 0.700 0.894 0.773 0.657 0.769 0.754 0.927 0.816 0.697 0.832

Results on COCO test-dev2017 with multi-scale test

Backbone Input size #Params GFLOPs AP AP .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_hrnet_w32 512x512 29.6M 45.4 0.698 0.890 0.766 0.652 0.765 0.751 0.924 0.811 0.695 0.828
pose_hrnet_w48 640x640 65.7M 141.5 0.710 0.892 0.780 0.671 0.769 0.767 0.932 0.830 0.715 0.839

Results on CrowdPose test without multi-scale test

Method AP AP .5 AP .75 AP (E) AP (M) AP (H)
pose_hrnet_w32 0.657 0.857 0.704 0.730 0.664 0.575
pose_hrnet_w48 0.673 0.864 0.722 0.746 0.681 0.587

Results on CrowdPose test with multi-scale test

Method AP AP .5 AP .75 AP (E) AP (M) AP (H)
pose_hrnet_w32 0.670 0.854 0.724 0.755 0.680 0.569
pose_hrnet_w48 0.680 0.855 0.734 0.766 0.688 0.584

Results with matching regression results to the closest keypoints detected from the keypoint heatmaps

DEKR-w32-SS DEKR-w32-MS DEKR-w48-SS DEKR-w48-MS
coco_val2017 0.680 0.710 0.710 0.728
coco_test-dev2017 0.673 0.702 0.701 0.714
crowdpose_test 0.655 0.675 0.670 0.683

Note:

  • Flip test is used.
  • GFLOPs is for convolution and linear layers only.

Environment

The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA V100 GPU cards for HRNet-w32 and 8 NVIDIA V100 GPU cards for HRNet-w48. Other platforms are not fully tested.

Quick start

Installation

  1. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.

  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Install COCOAPI:

    # COCOAPI=/path/to/clone/cocoapi
    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
    cd $COCOAPI/PythonAPI
    # Install into global site-packages
    make install
    # Alternatively, if you do not have permissions or prefer
    # not to install the COCO API into global site-packages
    python3 setup.py install --user
    

    Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.

  4. Install CrowdPoseAPI exactly the same as COCOAPI.

  5. Init output(training model output directory) and log(tensorboard log directory) directory:

    mkdir output 
    mkdir log
    

    Your directory tree should look like this:

    ${POSE_ROOT}
    ├── data
    ├── model
    ├── experiments
    ├── lib
    ├── tools 
    ├── log
    ├── output
    ├── README.md
    ├── requirements.txt
    └── setup.py
    
  6. Download pretrained models and our well-trained models from zoo(OneDrive) and make models directory look like this:

    ${POSE_ROOT}
    |-- model
    `-- |-- imagenet
        |   |-- hrnet_w32-36af842e.pth
        |   `-- hrnetv2_w48_imagenet_pretrained.pth
        |-- pose_coco
        |   |-- pose_dekr_hrnetw32_coco.pth
        |   `-- pose_dekr_hrnetw48_coco.pth
        |-- pose_crowdpose
        |   |-- pose_dekr_hrnetw32_crowdpose.pth
        |   `-- pose_dekr_hrnetw48_crowdpose.pth
        `-- rescore
            |-- final_rescore_coco_kpt.pth
            `-- final_rescore_crowd_pose_kpt.pth
    

Data preparation

For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. Download and extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        `-- images
            |-- train2017.zip
            `-- val2017.zip

For CrowdPose data, please download from CrowdPose download, Train/Val is needed for CrowdPose keypoints training. Download and extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- crowdpose
    `-- |-- json
        |   |-- crowdpose_train.json
        |   |-- crowdpose_val.json
        |   |-- crowdpose_trainval.json (generated by tools/crowdpose_concat_train_val.py)
        |   `-- crowdpose_test.json
        `-- images.zip

After downloading data, run python tools/crowdpose_concat_train_val.py under ${POSE_ROOT} to create trainval set.

Training and Testing

Testing on COCO val2017 dataset without multi-scale test using well-trained pose model

python tools/valid.py \
    --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
    TEST.MODEL_FILE models/pose_coco/pose_dekr_hrnetw32_coco.pth

Testing on COCO test-dev2017 dataset without multi-scale test using well-trained pose model

python tools/valid.py \
    --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
    TEST.MODEL_FILE models/pose_coco/pose_dekr_hrnetw32_coco.pth \ 
    DATASET.TEST test-dev2017

Testing on COCO val2017 dataset with multi-scale test using well-trained pose model

python tools/valid.py \
    --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
    TEST.MODEL_FILE models/pose_coco/pose_dekr_hrnetw32_coco.pth \ 
    TEST.NMS_THRE 0.15 \
    TEST.SCALE_FACTOR 0.5,1,2

Testing on COCO val2017 dataset with matching regression results to the closest keypoints detected from the keypoint heatmaps

python tools/valid.py \
    --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
    TEST.MODEL_FILE models/pose_coco/pose_dekr_hrnetw32_coco.pth \ 
    TEST.MATCH_HMP True

Testing on crowdpose test dataset without multi-scale test using well-trained pose model

python tools/valid.py \
    --cfg experiments/crowdpose/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_crowdpose_x300.yaml \
    TEST.MODEL_FILE models/pose_crowdpose/pose_dekr_hrnetw32_crowdpose.pth

Testing on crowdpose test dataset with multi-scale test using well-trained pose model

python tools/valid.py \
    --cfg experiments/crowdpose/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_crowdpose_x300.yaml \
    TEST.MODEL_FILE models/pose_crowdpose/pose_dekr_hrnetw32_crowdpose.pth \ 
    TEST.NMS_THRE 0.15 \
    TEST.SCALE_FACTOR 0.5,1,2

Testing on crowdpose test dataset with matching regression results to the closest keypoints detected from the keypoint heatmaps

python tools/valid.py \
    --cfg experiments/crowdpose/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_crowdpose_x300.yaml \
    TEST.MODEL_FILE models/pose_crowdpose/pose_dekr_hrnetw32_crowdpose.pth \ 
    TEST.MATCH_HMP True

Training on COCO train2017 dataset

python tools/train.py \
    --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \

Training on Crowdpose trainval dataset

python tools/train.py \
    --cfg experiments/crowdpose/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_crowdpose_x300.yaml \

Using inference demo

python tools/inference_demo.py --cfg experiments/coco/inference_demo_coco.yaml \
    --videoFile ../multi_people.mp4 \
    --outputDir output \
    --visthre 0.3 \
    TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw32.pth
python tools/inference_demo.py --cfg experiments/crowdpose/inference_demo_crowdpose.yaml \
    --videoFile ../multi_people.mp4 \
    --outputDir output \
    --visthre 0.3 \
    TEST.MODEL_FILE model/pose_crowdpose/pose_dekr_hrnetw32.pth \

The above command will create a video under output directory and a lot of pose image under output/pose directory.

Scoring net

We use a scoring net, consisting of two fully-connected layers (each followed by a ReLU layer), and a linear prediction layer which aims to learn the OKS score for the corresponding predicted pose. For this scoring net, you can directly use our well-trained model in the model/rescore folder. You can also train your scoring net using your pose estimation model by the following steps:

  1. Generate scoring dataset on train dataset:
python tools/valid.py \
    --cfg experiments/coco/rescore_coco.yaml \
    TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw32.pth
python tools/valid.py \
    --cfg experiments/crowdpose/rescore_crowdpose.yaml \
    TEST.MODEL_FILE model/pose_crowdpose/pose_dekr_hrnetw32.pth \
  1. Train the scoring net using the scoring dataset:
python tools/train_scorenet.py \
    --cfg experiment/coco/rescore_coco.yaml
python tools/train_scorenet.py \
    --cfg experiments/crowdpose/rescore_crowdpose.yaml \
  1. Using the well-trained scoring net to improve the performance of your pose estimation model (above 0.6AP).
python tools/valid.py \
    --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
    TEST.MODEL_FILE models/pose_coco/pose_dekr_hrnetw32_coco.pth
python tools/valid.py \
    --cfg experiments/crowdpose/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_crowdpose_x300.yaml \
    TEST.MODEL_FILE models/pose_crowdpose/pose_dekr_hrnetw32_crowdpose.pth \

Acknowledge

Our code is mainly based on HigherHRNet.

Citation

@inproceedings{GengSXZW21,
  title={Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression},
  author={Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, Jingdong Wang},
  booktitle={CVPR},
  year={2021}
}

@inproceedings{SunXLW19,
  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
  author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
  booktitle={CVPR},
  year={2019}
}

@article{WangSCJDZLMTWLX19,
  title={Deep High-Resolution Representation Learning for Visual Recognition},
  author={Jingdong Wang and Ke Sun and Tianheng Cheng and 
          Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and 
          Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
  journal={TPAMI}
  year={2019}
}
Owner
HRNet
Code for pose estimation is available at https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
HRNet
Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv MMCoref_cleaned Code for the MMCoref task of the SIMMC 2.0 dataset. Pre

Yichen (William) Huang 2 Dec 05, 2022
Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.

KK Blender Shader Pack A plugin and a shader to get you started with setting up an exported Koikatsu character in Blender. The plugin is a Blender add

166 Jan 01, 2023
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

Gowthami Somepalli 284 Dec 21, 2022
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 42 Nov 26, 2022
Instance Semantic Segmentation List

Instance Semantic Segmentation List This repository contains lists of state-or-art instance semantic segmentation works. Papers and resources are list

bighead 87 Mar 06, 2022
This is an official implementation for "AS-MLP: An Axial Shifted MLP Architecture for Vision".

AS-MLP architecture for Image Classification Model Zoo Image Classification on ImageNet-1K Network Resolution Top-1 (%) Params FLOPs Throughput (image

SVIP Lab 106 Dec 12, 2022
Object detection evaluation metrics using Python.

Object detection evaluation metrics using Python.

Louis Facun 2 Sep 06, 2022
Official Implementation of Few-shot Visual Relationship Co-localization

VRC Official implementation of the Few-shot Visual Relationship Co-localization (ICCV 2021) paper project page | paper Requirements Use python = 3.8.

22 Oct 13, 2022
Generating Digital Painting Lighting Effects via RGB-space Geometry (SIGGRAPH2020/TOG2020)

Project PaintingLight PaintingLight is a project conducted by the Style2Paints team, aimed at finding a method to manipulate the illumination in digit

651 Dec 29, 2022
A PyTorch implementation of deep-learning-based registration

DiffuseMorph Implementation A PyTorch implementation of deep-learning-based registration. Requirements OS : Ubuntu / Windows Python 3.6 PyTorch 1.4.0

24 Jan 03, 2023
LyaNet: A Lyapunov Framework for Training Neural ODEs

LyaNet: A Lyapunov Framework for Training Neural ODEs Provide the model type--config-name to train and test models configured as those shown in the pa

Ivan Dario Jimenez Rodriguez 21 Nov 21, 2022
Supporting code for "Autoregressive neural-network wavefunctions for ab initio quantum chemistry".

naqs-for-quantum-chemistry This repository contains the codebase developed for the paper Autoregressive neural-network wavefunctions for ab initio qua

Tom Barrett 24 Dec 23, 2022
Tzer: TVM Implementation of "Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation (OOPSLA'22)“.

Artifact • Reproduce Bugs • Quick Start • Installation • Extend Tzer Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation This is the s

12 Dec 29, 2022
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

hzwer 3k Jan 04, 2023
This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

Deep Conditional Gaussian Mixture Model for Constrained Clustering. This repository holds the code for the paper Deep Conditional Gaussian Mixture Mod

17 Oct 30, 2022
Extending JAX with custom C++ and CUDA code

Extending JAX with custom C++ and CUDA code This repository is meant as a tutorial demonstrating the infrastructure required to provide custom ops in

Dan Foreman-Mackey 237 Dec 23, 2022
CLIP (Contrastive Language–Image Pre-training) for Italian

Italian CLIP CLIP (Radford et al., 2021) is a multimodal model that can learn to represent images and text jointly in the same space. In this project,

Italian CLIP 114 Dec 29, 2022
Efficient semidefinite bounds for multi-label discrete graphical models.

Low rank solvers #################################### benchmark/ : folder with the random instances used in the paper. ############################

1 Dec 08, 2022
The Unsupervised Reinforcement Learning Benchmark (URLB)

The Unsupervised Reinforcement Learning Benchmark (URLB) URLB provides a set of leading algorithms for unsupervised reinforcement learning where agent

259 Dec 26, 2022
LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

This project is based on ultralytics/yolov3. LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image. The related paper is avai

26 Dec 13, 2022