[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Overview

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Getting Started

Our codes are implemented and tested with python 3.6 and pytorch 1.5.

Install Pytorch following the official guide on Pytorch website.

And install the requirements using virtualenv or conda:

pip install -r requirements.txt

Data Preparation

Refer to data.md for instructions.

Training

Stage 1 training

Generally, you can use the distributed launch script of pytorch to start training.

For example, for a training on 2 nodes, 4 gpus each (2x4=8 gpus total): On node 0, run:

python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=0 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage1.yaml

On node 1, run:

python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=1 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage1.yaml

Otherwise, if you are using task scheduling system such as Slurm to submit your training tasks, you can refer to this script to start your training:

# training on 2 nodes, 4 gpus each (2x4=8 gpus total)
sh scripts/run.sh 2 4 configs/config_stage1.yaml

The checkpoint of training will be saved in [results/] by default. You are free to modify it in the config file.

Stage 2 training

Use the last checkpoint of stage 1 to initialize the model and starts training stage 2.

# On Node 0.
python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=0 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage2.yaml --pretrained <PATH_TO_CHECKPOINT_FILE>

Similar on node 1.

Evaluation

To evaluate model on 3dpw test set:

python eval.py --cfg <PATH_TO_EXPERIMENT>/config.yaml --checkpoint <PATH_TO_EXPERIMENT>/model_best.pth.tar --eval_set 3dpw

Evaluation metric is Procrustes Aligned Mean Per Joint Position Error (PA-MPJPE) in mm.

Models PA-MPJPE ↓ MPJPE ↓ PVE ↓ ACCEL ↓
HMR (w/o 3DPW) 81.3 130.0 - 37.4
SPIN (w/o 3DPW) 59.2 96.9 116.4 29.8
MEVA (w/ 3DPW) 54.7 86.9 - 11.6
VIBE (w/o 3DPW) 56.5 93.5 113.4 27.1
VIBE (w/ 3DPW) 51.9 82.9 99.1 23.4
ours (w/o 3DPW) 50.7 88.8 104.5 18.0
ours (w/ 3DPW) 45.7 79.1 92.6 17.6

Citation

@inproceedings{wan2021,
  title={Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation},
  author={Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, Hongsheng Li},
  booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
  year = {2021}
}
Owner
Face and Body Tracking for VRM 3D models on the web.

Kalidoface 3D - Face and Full-Body tracking for Vtubing on the web! A sequal to Kalidoface which supports Live2D avatars, Kalidoface 3D is a web app t

Rich 257 Jan 02, 2023
Project to create an open-source 6 DoF input device

6DInputs A Project to create open-source 3D printed 6 DoF input devices Note the plural ('6DInputs' and 'devices') in the headings. We would like seve

RepRap Ltd 47 Jul 28, 2022
Official implementation of the paper "Topographic VAEs learn Equivariant Capsules"

Topographic Variational Autoencoder Paper: https://arxiv.org/abs/2109.01394 Getting Started Install requirements with Anaconda: conda env create -f en

T. Andy Keller 69 Dec 12, 2022
[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis [arxiv|pdf|v

Yinan He 78 Dec 22, 2022
Exploring whether attention is necessary for vision transformers

Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet Paper/Report TL;DR We replace the attention layer in a v

Luke Melas-Kyriazi 461 Jan 07, 2023
Spherical Confidence Learning for Face Recognition, accepted to CVPR2021.

Sphere Confidence Face (SCF) This repository contains the PyTorch implementation of Sphere Confidence Face (SCF) proposed in the CVPR2021 paper: Shen

Maths 70 Dec 09, 2022
Implementation of Shape Generation and Completion Through Point-Voxel Diffusion

Shape Generation and Completion Through Point-Voxel Diffusion Project | Paper Implementation of Shape Generation and Completion Through Point-Voxel Di

Linqi Zhou 103 Dec 29, 2022
Repository for the electrical and ICT benchmark model developed in the ERIGrid 2.0 project.

Benchmark Model Electrical and ICT System This repository contains the documentation, code, and models for the electrical and ICT benchmark model deve

ERIGrid 2.0 1 Nov 29, 2021
Code release for ICCV 2021 paper "Anticipative Video Transformer"

Anticipative Video Transformer Ranked first in the Action Anticipation task of the CVPR 2021 EPIC-Kitchens Challenge! (entry: AVT-FB-UT) [project page

Facebook Research 123 Dec 13, 2022
Fast Axiomatic Attribution for Neural Networks (NeurIPS*2021)

Fast Axiomatic Attribution for Neural Networks This is the official repository accompanying the NeurIPS 2021 paper: R. Hesse, S. Schaub-Meyer, and S.

Visual Inference Lab @TU Darmstadt 11 Nov 21, 2022
Boosted CVaR Classification (NeurIPS 2021)

Boosted CVaR Classification Runtian Zhai, Chen Dan, Arun Sai Suggala, Zico Kolter, Pradeep Ravikumar NeurIPS 2021 Table of Contents Quick Start Train

Runtian Zhai 4 Feb 15, 2022
A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

MADGRAD Optimization Method A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization pip install madgrad Try it out! A best

Meta Research 774 Dec 31, 2022
Experiments with the Robust Binary Interval Search (RBIS) algorithm, a Query-Based prediction algorithm for the Online Search problem.

OnlineSearchRBIS Online Search with Best-Price and Query-Based Predictions This is the implementation of the Robust Binary Interval Search (RBIS) algo

S. K. 1 Apr 16, 2022
FANet - Real-time Semantic Segmentation with Fast Attention

FANet Real-time Semantic Segmentation with Fast Attention Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko , Stan Sc

Ping Hu 42 Nov 30, 2022
A computer vision pipeline to identify the "icons" in Christian paintings

Christian-Iconography A computer vision pipeline to identify the "icons" in Christian paintings. A bit about iconography. Iconography is related to id

Rishab Mudliar 3 Jul 30, 2022
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" ([email protected])

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

Wanyu Du 18 Dec 29, 2022
CSD: Consistency-based Semi-supervised learning for object Detection

CSD: Consistency-based Semi-supervised learning for object Detection (NeurIPS 2019) By Jisoo Jeong, Seungeui Lee, Jee-soo Kim, Nojun Kwak Installation

80 Dec 15, 2022
tsflex - feature-extraction benchmarking

tsflex - feature-extraction benchmarking This repository withholds the benchmark results and visualization code of the tsflex paper and toolkit. Flow

PreDiCT.IDLab 5 Mar 25, 2022
Testing the Facial Emotion Recognition (FER) algorithm on animations

PegHeads-Tutorial-3 Testing the Facial Emotion Recognition (FER) algorithm on animations

PegHeads Inc 2 Jan 03, 2022
Low-code/No-code approach for deep learning inference on devices

EzEdgeAI A concept project that uses a low-code/no-code approach to implement deep learning inference on devices. It provides a componentized framewor

On-Device AI Co., Ltd. 7 Apr 05, 2022