[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Last update: Dec 15, 2022

Related tags

Overview

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Getting Started

Our codes are implemented and tested with python 3.6 and pytorch 1.5.

Install Pytorch following the official guide on Pytorch website.

And install the requirements using virtualenv or conda:

pip install -r requirements.txt

Data Preparation

Refer to data.md for instructions.

Training

Stage 1 training

Generally, you can use the distributed launch script of pytorch to start training.

For example, for a training on 2 nodes, 4 gpus each (2x4=8 gpus total): On node 0, run:

python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=0 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage1.yaml

On node 1, run:

python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=1 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage1.yaml

Otherwise, if you are using task scheduling system such as Slurm to submit your training tasks, you can refer to this script to start your training:

# training on 2 nodes, 4 gpus each (2x4=8 gpus total)
sh scripts/run.sh 2 4 configs/config_stage1.yaml

The checkpoint of training will be saved in [results/] by default. You are free to modify it in the config file.

Stage 2 training

Use the last checkpoint of stage 1 to initialize the model and starts training stage 2.

# On Node 0.
python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=0 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage2.yaml --pretrained <PATH_TO_CHECKPOINT_FILE>

Similar on node 1.

Evaluation

To evaluate model on 3dpw test set:

python eval.py --cfg <PATH_TO_EXPERIMENT>/config.yaml --checkpoint <PATH_TO_EXPERIMENT>/model_best.pth.tar --eval_set 3dpw

Evaluation metric is Procrustes Aligned Mean Per Joint Position Error (PA-MPJPE) in mm.

Models	PA-MPJPE ↓	MPJPE ↓	PVE ↓	ACCEL ↓
HMR (w/o 3DPW)	81.3	130.0	-	37.4
SPIN (w/o 3DPW)	59.2	96.9	116.4	29.8
MEVA (w/ 3DPW)	54.7	86.9	-	11.6
VIBE (w/o 3DPW)	56.5	93.5	113.4	27.1
VIBE (w/ 3DPW)	51.9	82.9	99.1	23.4
ours (w/o 3DPW)	50.7	88.8	104.5	18.0
ours (w/ 3DPW)	45.7	79.1	92.6	17.6

Citation

@inproceedings{wan2021,
  title={Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation},
  author={Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, Hongsheng Li},
  booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
  year = {2021}
}

[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Related tags

Overview

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Getting Started

Data Preparation

Training

Stage 1 training

Stage 2 training

Evaluation

Citation

Owner

ZiNiU WaN

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

Robust Consistent Video Depth Estimation

根据midi文件演奏“风物之诗琴”的脚本 "Windsong Lyre" auto play

True Few-Shot Learning with Language Models

BMN: Boundary-Matching Network

Repository aimed at compiling code, papers, demos etc.. related to my PhD on 3D vision and machine learning for fruit detection and shape estimation at the university of Lincoln

Inferring Lexicographically-Ordered Rewards from Preferences

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

SiT: Self-supervised vIsion Transformer

Reinfore learning tool box, contains trpo, a3c algorithm for continous action space

A pre-trained model with multi-exit transformer architecture.

Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Python Environment for Bayesian Learning

GAN JAX - A toy project to generate images from GANs with JAX

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Hierarchical Attentive Recurrent Tracking

This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack".