[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

Last update: Dec 23, 2022

Overview

TransFusion-Pose

TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation
Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei Liu, Hao Tang, Xiangyi Yan, Yusheng Xie, Shih-Yao Lin and Xiaohui Xie
In BMVC 2021
[Paper] [Video]

Overview

We propose the TransFusion, which apply the transformer architecture to multi-view 3D human pose estimation
We propose the Epipolar Field, a novel and more general form of epipolar line. It readily integrates with the transformer through our proposed geometry positional encoding to encode the 3D relationships among different views.
Extensive experiments are conducted to demonstrate that our TransFusion outperforms previous fusion methods on both Human 3.6M and SkiPose datasets, but requires substantially fewer parameters.

Installation

Clone this repo, and we'll call the directory that you cloned multiview-pose as ${POSE_ROOT}

git clone https://github.com/HowieMa/TransFusion-Pose.git

Install dependencies.

pip install -r requirements.txt

Download TransPose models pretrained on COCO.

wget https://github.com/yangsenius/TransPose/releases/download/Hub/tp_r_256x192_enc3_d256_h1024_mh8.pth

You can also download it from the official website of TransPose

Please download them under ${POSE_ROOT}/models, and make them look like this:

${POSE_ROOT}/models
└── pytorch
    └── coco
        └── tp_r_256x192_enc3_d256_h1024_mh8.pth

Data preparation

Human 3.6M

For Human36M data, please follow H36M-Toolbox to prepare images and annotations.

Ski-Pose

For Ski-Pose, please follow the instruction from their website to obtain the dataset.
Once you download the Ski-PosePTZ-CameraDataset-png.zip and ski_centers.csv, unzip them and put into the same folder, named as ${SKI_ROOT}.
Run python data/preprocess_skipose.py ${SKI_ROOT} to format it.

Your folder should look like this:

${POSE_ROOT}
|-- data
|-- |-- h36m
    |-- |-- annot
        |   |-- h36m_train.pkl
        |   |-- h36m_validation.pkl
        |-- images
            |-- s_01_act_02_subact_01_ca_01 
            |-- s_01_act_02_subact_01_ca_02

|-- |-- preprocess_skipose.py
|-- |-- skipose  
    |-- |-- annot
        |   |-- ski_train.pkl
        |   |-- ski_validation.pkl
        |-- images
            |-- seq_103 
            |-- seq_103

Training and Testing

Human 3.6M

# Training
python run/pose2d/train.py --cfg experiments-local/h36m/transpose/256_fusion_enc3_GPE.yaml --gpus 0,1,2,3

# Evaluation (2D)
python run/pose2d/valid.py --cfg experiments-local/h36m/transpose/256_fusion_enc3_GPE.yaml --gpus 0,1,2,3  

# Evaluation (3D)
python run/pose3d/estimate_tri.py --cfg experiments-local/h36m/transpose/256_fusion_enc3_GPE.yaml

Ski-Pose

# Training
python run/pose2d/train.py --cfg experiments-local/skipose/transpose/256_fusion_enc3_GPE.yaml --gpus 0,1,2,3

# Evaluation (2D)
python run/pose2d/valid.py --cfg experiments-local/skipose/transpose/256_fusion_enc3_GPE.yaml --gpus 0,1,2,3

# Evaluation (3D)
python run/pose3d/estimate_tri.py --cfg experiments-local/skipose/transpose/256_fusion_enc3_GPE.yaml

Our trained models can be downloaded from here

Citation

If you find our code helps your research, please cite the paper:

@inproceedings{ma2021transfusion,
  title={TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation},
  author={Ma, Haoyu and Chen, Liangjian and Kong, Deying and Wang, Zhe and Liu, Xingwei and Tang, Hao and Yan, Xiangyi and Xie, Yusheng and Lin, Shih-Yao and Xie, Xiaohui},
  booktitle={British Machine Vision Conference},
  year={2021}
}

[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

Related tags

Overview

TransFusion-Pose

Overview

Installation

Data preparation

Human 3.6M

Ski-Pose

Training and Testing

Human 3.6M

Ski-Pose

Citation

Acknowledgement

Owner

Haoyu Ma

Good Semi-Supervised Learning That Requires a Bad GAN

IhoneyBakFileScan Modify - 批量网站备份文件扫描器，增加文件规则，优化内存占用

[CVPR 2022] Structured Sparse R-CNN for Direct Scene Graph Generation

Text mining project; Using distilBERT to predict authors in the classification task authorship attribution.

Face detection using deep learning.

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks

Visual Memorability for Robotic Interestingness via Unsupervised Online Learning (ECCV 2020 Oral and TRO)

Implementation of ViViT: A Video Vision Transformer

AI-Bot - 一个基于watermelon改造的OpenAI-GPT-2的智能机器人

TransNet V2: Shot Boundary Detection Neural Network

Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

Meta-learning for NLP

Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

Kaggle Ultrasound Nerve Segmentation competition [Keras]

[NeurIPS2021] Code Release of Learning Transferable Perturbations

[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

Implement some metaheuristics and cost functions

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control