Basics of 2D and 3D Human Pose Estimation.

Overview

Human Pose Estimation 101

If you want a slightly more rigorous tutorial and understand the basics of Human Pose Estimation and how the field has evolved, check out these articles I published on 2D Pose Estimation and 3D Pose Estimation

Table of Contents

Basics

  • Defined as the problem of localization of human joints (or) keypoints
  • A rigid body consists of joints and rigid parts. A body with strong articulation is a body with strong contortion.
  • Pose Estimation is the search for a specific pose in space of all articulated poses
  • Number of keypoints varies with dataset - LSP has 14, MPII has 16, 16 are used in Human3.6m
  • Classifed into 2D and 3D Pose Estimation
    • 2D Pose Estimation
    • Estimate a 2D pose (x,y) coordinates for each joint in pixel space from a RGB image
    • 3D Pose Estimation
    • Estimate a 3D pose (x,y,z) coordinates in metric space from a RGB image, or in previous works, data from a RGB-D sensor. (However, research in the past few years is heavily focussed on generating 3D poses from 2D images / 2D videos)

Loss

  • Most commonly used loss function - Mean Squared Error, MSE(Least Squares Loss)
  • This is a regression problem. The model will try to regress to the the correct coordinates, i.e move to the ground truth coordinatate’s in small increments. The model is trained to output continuous coordinates using a Mean Squared Error loss function

Evaluation metrics

Percentage of Correct Parts - PCP

  • A limb is considered detected and a correct part if the distance between the two predicted joint locations and the true limb joint locations is at most half of the limb length (PCP at 0.5 )
  • Measures detection rate of limbs
  • Cons - penalizes shorter limbs
  • Calculation
    • For a specific part, PCP = (No. of correct parts for entire dataset) / (No. of total parts for entire dataset)
    • Take a dataset with 10 images and 1 pose per image. Each pose has 8 parts - ( upper arm, lower arm, upper leg, lower leg ) x2
    • No of upper arms = 10 * 2 = 20
    • No of lower arms = 20
    • No of lower legs = No of upper legs = 20
    • If upper arm is detected correct for 17 out of the 20 upper arms i.e 17 ( 10 right arms and 7 left) → PCP = 17/20 = 85%
  • Higher the better

Percentage of Correct Key-points - PCK

  • Detected joint is considered correct if the distance between the predicted and the true joint is within a certain threshold (threshold varies)
  • [email protected] is when the threshold = 50% of the head bone link
  • [email protected] == Distance between predicted and true joint < 0.2 * torso diameter
  • Sometimes 150 mm is taken as the threshold
  • Head, shoulder, Elbow, Wrist, Hip, Knee, Ankle → Keypoints
  • PCK is used for 2D and 3D (PCK3D)
  • Higher the better

Percentage of Detected Joints - PDJ

  • Detected joint is considered correct if the distance between the predicted and the true joint is within a certain fraction of the torso diameter
  • Alleviates the shorter limb problem since shorter limbs have smaller torsos
  • PDJ at 0.2 → Distance between predicted and true join < 0.2 * torso diameter
  • Typically used for 2D Pose Estimation
  • Higher the better

Mean Per Joint Position Error - MPJPE

  • Per joint position error = Euclidean distance between ground truth and prediction for a joint
  • Mean per joint position error = Mean of per joint position error for all k joints (Typically, k = 16)
  • Calculated after aligning the root joints (typically the pelvis) of the estimated and groundtruth 3D pose.
  • PA MPJPE
    • Procrustes analysis MPJPE.
    • MPJPE calculated after the estimated 3D pose is aligned to the groundtruth by the Procrustes method
    • Procrustes method is simply a similarity transformation
  • Lower the better
  • Used for 3D Pose Estimation

AUC

Important Applications

  • Activity Analysis
  • Human-Computer Interaction (HCI)
  • Virtual Reality
  • Augmented Reality
  • Amazon Go presents an important domain for the application of Human Pose Estimation. Cameras track and recognize people and their actions, for which Pose Estimation is an important component. Entities relying on services that track and measure human activities rely heavily on human Pose Estimation

Informative roadmap on 2D Human Pose Estimation research

Owner
Sudharshan Chandra Babu
Machine Learning Engineer
Sudharshan Chandra Babu
CNN Based Meta-Learning for Noisy Image Classification and Template Matching

CNN Based Meta-Learning for Noisy Image Classification and Template Matching Introduction This master thesis used a few-shot meta learning approach to

Kumar Manas 2 Dec 09, 2021
PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

55 Nov 23, 2022
A system used to detect whether a person is wearing a medical mask or not.

Mask_Detection_System A system used to detect whether a person is wearing a medical mask or not. To open the program, please follow these steps: Make

Mohamed Emad 0 Nov 17, 2022
An implementation of the [Hierarchical (Sig-Wasserstein) GAN] algorithm for large dimensional Time Series Generation

Hierarchical GAN for large dimensional financial market data Implementation This repository is an implementation of the [Hierarchical (Sig-Wasserstein

11 Nov 29, 2022
Answering Open-Domain Questions of Varying Reasoning Steps from Text

This repository contains the authors' implementation of the Iterative Retriever, Reader, and Reranker (IRRR) model in the EMNLP 2021 paper "Answering Open-Domain Questions of Varying Reasoning Steps

26 Dec 22, 2022
PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency"

Improving Generation and Evaluation of Visual Stories via Semantic Consistency PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluat

Adyasha Maharana 28 Dec 08, 2022
基于深度强化学习的原神自动钓鱼AI

原神自动钓鱼AI由YOLOX, DQN两部分模型组成。使用迁移学习,半监督学习进行训练。 模型也包含一些使用opencv等传统数字图像处理方法实现的不可学习部分。

4.2k Jan 01, 2023
PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation

PocketNet This is the official repository of the paper: PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and M

Fadi Boutros 40 Dec 22, 2022
CPF: Learning a Contact Potential Field to Model the Hand-object Interaction

Contact Potential Field This repo contains model, demo, and test codes of our paper: CPF: Learning a Contact Potential Field to Model the Hand-object

Lixin YANG 99 Dec 26, 2022
Dictionary Learning with Uniform Sparse Representations for Anomaly Detection

Dictionary Learning with Uniform Sparse Representations for Anomaly Detection Implementation of the Uniform DL Representation for AD algorithm describ

Paul Irofti 1 Nov 23, 2022
Affine / perspective transformation in Pose Estimation with Tensorflow 2

Pose Transformation Affine / Perspective transformation in Pose Estimation with Tensorflow 2 Introduction 이 repo는 pose estimation을 연구하고 개발하는 데 도움이 되기

Kim Junho 1 Dec 22, 2021
113 Nov 28, 2022
f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation [Paper] [PyTorch] [MXNet] [Video] This repository provides code for training

Visual Understanding Lab @ Samsung AI Center Moscow 516 Dec 21, 2022
Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"

Low Precision Decentralized Training with Heterogenous Data Official PyTorch implementation for "Low Precision Decentralized Distributed Training with

Aparna Aketi 0 Nov 23, 2021
ICNet and PSPNet-50 in Tensorflow for real-time semantic segmentation

Real-Time Semantic Segmentation in TensorFlow Perform pixel-wise semantic segmentation on high-resolution images in real-time with Image Cascade Netwo

Oles Andrienko 219 Nov 21, 2022
COD-Rank-Localize-and-Segment (CVPR2021)

COD-Rank-Localize-and-Segment (CVPR2021) Simultaneously Localize, Segment and Rank the Camouflaged Objects Full camouflage fixation training dataset i

JingZhang 52 Dec 20, 2022
Source code of our work: "Benchmarking Deep Models for Salient Object Detection"

SALOD Source code of our work: "Benchmarking Deep Models for Salient Object Detection". In this works, we propose a new benchmark for SALient Object D

22 Dec 30, 2022
SMIS - Semantically Multi-modal Image Synthesis(CVPR 2020)

Semantically Multi-modal Image Synthesis Project page / Paper / Demo Semantically Multi-modal Image Synthesis(CVPR2020). Zhen Zhu, Zhiliang Xu, Anshen

316 Dec 01, 2022
RL and distillation in CARLA using a factorized world model

World on Rails Learning to drive from a world on rails Dian Chen, Vladlen Koltun, Philipp Krähenbühl, arXiv techical report (arXiv 2105.00636) This re

Dian Chen 131 Dec 16, 2022
Yggdrasil - A simplistic bot designed to streamline your server experience

Ygggdrasil A simplistic bot designed to streamline your server experience. Desig

Sntx_ 1 Dec 14, 2022