KAPAO is an efficient multi-person human pose estimation model that detects keypoints and poses as objects and fuses the detections to predict human poses.

Last update: Dec 30, 2022

Overview

KAPAO (Keypoints and Poses as Objects)

KAPAO is an efficient single-stage multi-person human pose estimation model that models keypoints and poses as objects within a dense anchor-based detection framework. When not using test-time augmentation (TTA), KAPAO is much faster and more accurate than previous single-stage methods like DEKR and HigherHRNet:

This repository contains the official PyTorch implementation for the paper:
Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation.

Our code was forked from ultralytics/yolov5 at commit 5487451.

Setup

If you haven't already, install Anaconda or Miniconda.
Create a new conda environment with Python 3.6: $ conda create -n kapao python=3.6.
Activate the environment: $ conda activate kapao
Clone this repo: $ git clone https://github.com/wmcnally/kapao.git
Install the dependencies: $ cd kapao && pip install -r requirements.txt
Download the trained models: $ sh data/scripts/download_models.sh

Inference Demos

Note: FPS calculations includes all processing, including inference, plotting / tracking, image resizing, etc. See demo script arguments for inference options.

Flash Mob Demo

This demo runs inference on a 720p dance video (native frame-rate of 25 FPS).

To display the inference results in real-time:
$ python demos/flash_mob.py --weights kapao_s_coco.pt --display --fps

To create the GIF above:
$ python demos/flash_mob.py --weights kapao_s_coco.pt --start 188 --end 196 --gif --fps

Squash Demo

This demo runs inference on a 1080p slow motion squash video (native frame-rate of 25 FPS). It uses a simple player tracking algorithm based on the frame-to-frame pose differences.

To display the inference results in real-time:
$ python demos/squash.py --weights kapao_s_coco.pt --display --fps

To create the GIF above:
$ python demos/squash.py --weights kapao_s_coco.pt --start 42 --end 50 --gif --fps

COCO Experiments

Download the COCO dataset: $ sh data/scripts/get_coco_kp.sh

Validation (without TTA)

KAPAO-S (63.0 AP): $ python val.py --rect
KAPAO-M (68.5 AP): $ python val.py --rect --weights kapao_m_coco.pt
KAPAO-L (70.6 AP): $ python val.py --rect --weights kapao_l_coco.pt

Validation (with TTA)

KAPAO-S (64.3 AP): $ python val.py --scales 0.8 1 1.2 --flips -1 3 -1
KAPAO-M (69.6 AP): $ python val.py --weights kapao_m_coco.pt \
--scales 0.8 1 1.2 --flips -1 3 -1
KAPAO-L (71.6 AP): $ python val.py --weights kapao_l_coco.pt \
--scales 0.8 1 1.2 --flips -1 3 -1

Testing

KAPAO-S (63.8 AP): $ python val.py --scales 0.8 1 1.2 --flips -1 3 -1 --task test
KAPAO-M (68.8 AP): $ python val.py --weights kapao_m_coco.pt \
--scales 0.8 1 1.2 --flips -1 3 -1 --task test
KAPAO-L (70.3 AP): $ python val.py --weights kapao_l_coco.pt \
--scales 0.8 1 1.2 --flips -1 3 -1 --task test

Training

The following commands were used to train the KAPAO models on 4 V100s with 32GB memory each.

KAPAO-S:

python -m torch.distributed.launch --nproc_per_node 4 train.py \
--img 1280 \
--batch 128 \
--epochs 500 \
--data data/coco-kp.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5s6.pt \
--project runs/s_e500 \
--name train \
--workers 128

KAPAO-M:

python train.py \
--img 1280 \
--batch 72 \
--epochs 500 \
--data data/coco-kp.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5m6.pt \
--project runs/m_e500 \
--name train \
--workers 128

KAPAO-L:

python train.py \
--img 1280 \
--batch 48 \
--epochs 500 \
--data data/coco-kp.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5l6.pt \
--project runs/l_e500 \
--name train \
--workers 128

Note: DDP is usually recommended but we found training was less stable for KAPAO-M/L using DDP. We are investigating this issue.

CrowdPose Experiments

Install the CrowdPose API to your conda environment:
$ cd .. && git clone https://github.com/Jeff-sjtu/CrowdPose.git
$ cd CrowdPose/crowdpose-api/PythonAPI && sh install.sh && cd ../../../kapao
Download the CrowdPose dataset: $ sh data/scripts/get_crowdpose.sh

Testing

KAPAO-S (63.8 AP): $ python val.py --data crowdpose.yaml \
--weights kapao_s_crowdpose.pt --scales 0.8 1 1.2 --flips -1 3 -1
KAPAO-M (67.1 AP): $ python val.py --data crowdpose.yaml \
--weights kapao_m_crowdpose.pt --scales 0.8 1 1.2 --flips -1 3 -1
KAPAO-L (68.9 AP): $ python val.py --data crowdpose.yaml \
--weights kapao_l_crowdpose.pt --scales 0.8 1 1.2 --flips -1 3 -1

Training

The following commands were used to train the KAPAO models on 4 V100s with 32GB memory each. Training was performed on the trainval split with no validation. The test results above were generated using the last model checkpoint.

KAPAO-S:

python -m torch.distributed.launch --nproc_per_node 4 train.py \
--img 1280 \
--batch 128 \
--epochs 300 \
--data data/crowdpose.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5s6.pt \
--project runs/cp_s_e300 \
--name train \
--workers 128 \
--noval

KAPAO-M:

python train.py \
--img 1280 \
--batch 72 \
--epochs 300 \
--data data/coco-kp.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5m6.pt \
--project runs/cp_m_e300 \
--name train \
--workers 128 \
--noval

KAPAO-L:

python train.py \
--img 1280 \
--batch 48 \
--epochs 300 \
--data data/crowdpose.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5l6.pt \
--project runs/cp_l_e300 \
--name train \
--workers 128 \
--noval

Acknowledgements

This work was supported in part by Compute Canada, the Canada Research Chairs Program, the Natural Sciences and Engineering Research Council of Canada, a Microsoft Azure Grant, and an NVIDIA Hardware Grant.

If you find this repo is helpful in your research, please cite our paper:

@article{mcnally2021kapao,
  title={Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation},
  author={McNally, William and Vats, Kanav and Wong, Alexander and McPhee, John},
  journal={arXiv preprint arXiv:2111.08557},
  year={2021}
}

Please also consider citing our previous works:

@inproceedings{mcnally2021deepdarts,
  title={DeepDarts: Modeling Keypoints as Objects for Automatic Scorekeeping in Darts using a Single Camera},
  author={McNally, William and Walters, Pascale and Vats, Kanav and Wong, Alexander and McPhee, John},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4547--4556},
  year={2021}
}

@article{mcnally2021evopose2d,
  title={EvoPose2D: Pushing the Boundaries of 2D Human Pose Estimation Using Accelerated Neuroevolution With Weight Transfer},
  author={McNally, William and Vats, Kanav and Wong, Alexander and McPhee, John},
  journal={IEEE Access},
  volume={9},
  pages={139403--139414},
  year={2021},
  publisher={IEEE}
}

KAPAO is an efficient multi-person human pose estimation model that detects keypoints and poses as objects and fuses the detections to predict human poses.

Related tags

Overview

KAPAO (Keypoints and Poses as Objects)

Setup

Inference Demos

Flash Mob Demo

Squash Demo

COCO Experiments

Validation (without TTA)

Validation (with TTA)

Testing

Training

CrowdPose Experiments

Testing

Training

Acknowledgements

Owner

Will McNally

CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

You Only 👀 One Sequence

Repositório da disciplina de APC, no segundo semestre de 2021

DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021]

Numbering permanent and deciduous teeth via deep instance segmentation in panoramic X-rays

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Analysis of Smiles through reservoir sampling & RDkit

Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

python 93% acc. CNN Dogs Vs Cats ( Pytorch )

Current state of supervised and unsupervised depth completion methods

A light-weight image labelling tool for Python designed for creating segmentation data sets.

This is an official implementation for "Video Swin Transformers".

The official implementation of Equalization Loss for Long-Tailed Object Recognition (CVPR 2020) based on Detectron2

Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Code for paper entitled "Improving Novelty Detection using the Reconstructions of Nearest Neighbours"

EPSANet：An Efficient Pyramid Split Attention Block on Convolutional Neural Network

Virtual Dance Reality Stage is a feature that offers you to share a stage with another user virtually.

Data cleaning, missing value handle, EDA use in this project

Our implementation used for the MICCAI 2021 FLARE Challenge titled 'Efficient Multi-Organ Segmentation Using SpatialConfiguartion-Net with Low GPU Memory Requirements'.