"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Last update: Dec 05, 2022

Related tags

Overview

Texformer: 3D Human Texture Estimation from a Single Image with Transformers

This is the official implementation of "3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021 (Oral)

Highlights

Texformer: a novel structure combining Transformer and CNN
Low-Rank Attention layer (LoRA) with linear complexity
Combination of RGB UV map and texture flow
Part-style loss
Face-structure loss

BibTeX

@inproceedings{xu2021texformer,
  title={{3D} Human Texture Estimation from a Single Image with Transformers},
  author={Xu, Xiangyu and Loy, Chen Change},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Abstract

We propose a Transformer-based framework for 3D human texture estimation from a single image. The proposed Transformer is able to effectively exploit the global information of the input image, overcoming the limitations of existing methods that are solely based on convolutional neural networks. In addition, we also propose a mask-fusion strategy to combine the advantages of the RGB-based and texture-flow-based models. We further introduce a part-style loss to help reconstruct high-fidelity colors without introducing unpleasant artifacts. Extensive experiments demonstrate the effectiveness of the proposed method against state-of-the-art 3D human texture estimation approaches both quantitatively and qualitatively.

Overview

The Query is a pre-computed color encoding of the UV space obtained by mapping the 3D coordinates of a standard human body mesh to the UV space. The Key is a concatenation of the input image and the 2D part-segmentation map. The Value is a concatenation of the input image and its 2D coordinates. We first feed the Query, Key, and Value into three CNNs to transform them into feature space. Then the multi-scale features are sent to the Transformer units to generate the Output features. The multi-scale Output features are processed and fused in another CNN, which produces the RGB UV map T, texture flow F, and fusion mask M. The final UV map is generated by combining T and the textures sampled with F using the fusion mask M. Note that we have skip connections between the same-resolution layers of the CNNs similar to [1] which have been omitted in the figure for brevity.

Visual Results

For each example, the image on the left is the input, and the image on the right is the rendered 3D human, where the human texture is predicted by the proposed Texformer, and the geometry is predicted by RSC-Net.

Install

Manage the environment with Anaconda

conda create -n texformer anaconda
conda activate texformer

Pytorch-1.4, CUDA-9.2

conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=9.2 -c pytorch

Install Pytorch-neural-renderer according to the instructions here

Download

Download meta data, and put it in "./meta/".
Download pretrained model, and put it in "./pretrained".
We propose an enhanced Market-1501 dataset, termed as SMPLMarket, by equipping the original data of Market-1501 with SMPL estimation from RSC-Net and body part segmentation estimated by EANet. Please download the SMPLMarket dataset and put it in "./datasets/".
Other datasets: PRW, surreal, CUHK-SYSU. Please put these datasets in "./datasets/".
All the paths are set in "config.py".

Demo

Run the Texformer with human part segmentation from an off-the-shelf model:

python demo.py --img_path demo_imgs/img.png --seg_path demo_imgs/seg.png

If you don't want to run an external model for human part segmentation, you can use the human part segmentation of RSC-Net instead (note that this may affect the performance as the segmentation of RSC-Net is not very accurate due to the limitation of SMPL):

python demo.py --img_path demo_imgs/img.png

Train

Run the training code with default settings:

python trainer.py --exp_name texformer

Evaluation

Run the evaluation on the SPMLMarket dataset:

python eval.py --checkpoint_path ./pretrained/texformer_ep500.pt

References

[1] "3D Human Pose, Shape and Texture from Low-Resolution Images and Videos", IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.

[2] "3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning", ECCV, 2020

[3] "SMPL: A Skinned Multi-Person Linear Model", SIGGRAPH Asia, 2015

[4] "Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising", IEEE Transactions on Image Processing, 2020.

[5] "Learning Factorized Weight Matrix for Joint Filtering", ICML, 2020

"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Related tags

Overview

Texformer: 3D Human Texture Estimation from a Single Image with Transformers

Highlights

BibTeX

Abstract

Overview

Visual Results

Install

Download

Demo

Train

Evaluation

References

Owner

XiangyuXu

Gradient representations in ReLU networks as similarity functions

Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

Simple Baselines for Human Pose Estimation and Tracking

Python based Advanced AI Assistant

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

Fewshot-face-translation-GAN - Generative adversarial networks integrating modules from FUNIT and SPADE for face-swapping.

Convert Python 3 code to CUDA code.

(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation

Code for the published paper : Learning to recognize rare traffic sign

A Framework for Encrypted Machine Learning in TensorFlow

A fuzzing framework for SMT solvers

A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch

Laplace Redux -- Effortless Bayesian Deep Learning

A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

Learning to Prompt for Vision-Language Models.

Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

PoseCamera is python based SDK for human pose estimation through RGB webcam.

Baseline powergrid model for NY

"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Related tags

Overview

Texformer: 3D Human Texture Estimation from a Single Image with Transformers

Highlights

BibTeX

Abstract

Overview

Visual Results

Install

Download

Demo

Train

Evaluation

References

Owner

XiangyuXu

Gradient representations in ReLU networks as similarity functions

Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

Simple Baselines for Human Pose Estimation and Tracking

Python based Advanced AI Assistant

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

Fewshot-face-translation-GAN - Generative adversarial networks integrating modules from FUNIT and SPADE for face-swapping.

Convert Python 3 code to CUDA code.

(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation

Code for the published paper : Learning to recognize rare traffic sign

A Framework for Encrypted Machine Learning in TensorFlow

A fuzzing framework for SMT solvers

A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

Laplace Redux -- Effortless Bayesian Deep Learning

A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

Learning to Prompt for Vision-Language Models.

Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

PoseCamera is python based SDK for human pose estimation through RGB webcam.

Baseline powergrid model for NY

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch