Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

Overview

MotionCLIP

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space".

Please visit our webpage for more details.

teaser

Bibtex

If you find this code useful in your research, please cite:

@article{tevet2022motionclip,
title={MotionCLIP: Exposing Human Motion Generation to CLIP Space},
author={Tevet, Guy and Gordon, Brian and Hertz, Amir and Bermano, Amit H and Cohen-Or, Daniel},
journal={arXiv preprint arXiv:2203.08063},
year={2022}
}

Getting started

1. Create conda environment

conda env create -f environment.yml
conda activate motionclip

The code was tested on Python 3.8 and PyTorch 1.8.1.

2. Download data

Download and unzip the above datasets and place them correspondingly:

  • AMASS -> ./data/amass (Download the SMPL+H version for each dataset separately, please note to download ALL the dataset in AMASS website)
  • BABEL -> ./data/babel_v1.0_release
  • Rendered AMASS images -> ./data/render

3. Download the SMPL body model

bash prepare/download_smpl_files.sh

This will download the SMPL neutral model from this github repo and additionnal files.

In addition, download the Extended SMPL+H model (used in AMASS project) from MANO, and place it in ./models/smplh.

4. Parse data

Process the three datasets into a unified dataset with (text, image, motion) triplets.

To parse acording to the AMASS split (for all applications except action recognition), run:

python -m src.datasets.amass_parser --dataset_name amass

Only if you intend to use Action Recognition, run also:

python -m src.datasets.amass_parser --dataset_name babel

Using the pretrained model

First, download the model and place it at ./exps/paper-model

1. Text-to-Motion

To reproduce paper results, run:

 python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt

To run MotionCLIP with your own texts, create a text file, with each line depicts a different text input (see paper_texts.txt as a reference) and point to it with --input_file instead.

2. Vector Editing

To reproduce paper results, run:

 python -m src.visualize.motion_editing ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_edits.csv

To gain the input motions, we support two modes:

  • data - Retrieve motions from train/validation sets, according to their textual label. On it first run, src.visualize.motion_editing generates a file containing a list of all textual labels. You can look it up and choose motions for your own editing.
  • text - The inputs are free texts, instead of motions. We use CLIP text encoder to get CLIP representations, perform vector editing, then use MotionCLIP decoder to output the edited motion.

To run MotionCLIP on your own editing, create a csv file, with each line depicts a different edit (see paper_edits.csv as a reference) and point to it with --input_file instead.

3. Interpolation

To reproduce paper results, run:

 python -m src.visualize.motion_interpolation ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_interps.csv

To gain the input motions, we use the data mode described earlier.

To run MotionCLIP on your own interpolations, create a csv file, with each line depicts a different interpolation (see paper_interps.csv as a reference) and point to it with --input_file instead.

4. Action Recognition

For action recognition, we use a model trained on text class names. Download and place it at ./exps/classes-model.

python -m src.utils.action_classifier ./exps/classes-model/checkpoint_0200.pth.tar

Train your own

NOTE (11/MAY/22): The paper model is not perfectly reproduced using this code. We are working to resolve this issue. The trained model checkpoint we provide does reproduce results.

To reproduce paper-model run:

python -m src.train.train --clip_text_losses cosine --clip_image_losses cosine --pose_rep rot6d \
--lambda_vel 100 --lambda_rc 100 --lambda_rcxyz 100 \
--jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 \
--lr 0.0001 --glob --translation --no-vertstrans --latent_dim 512 --num_epochs 500 --snapshot 10 \
--device <GPU DEVICE ID> \
--datapath ./data/amass_db/amass_30fps_db.pt \
--folder ./exps/my-paper-model

To reproduce classes-model run:

python -m src.train.train --clip_text_losses cosine --clip_image_losses cosine --pose_rep rot6d \
--lambda_vel 95 --lambda_rc 95 --lambda_rcxyz 95 \
--jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 \
--lr 0.0001 --glob --translation --no-vertstrans --latent_dim 512 --num_epochs 500 --snapshot 10 \
--device <GPU DEVICE ID> \
--datapath ./data/amass_db/babel_30fps_db.pt \
--folder ./exps/my-classes-model

Acknowledgment

The code of the transformer model and the dataloader are based on ACTOR repository.

License

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.

Owner
Guy Tevet
CS PhD student
Guy Tevet
Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

82 Jan 01, 2023
Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR)

Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR) This is the official implementation of our paper Personalized Tran

Yongchun Zhu 81 Dec 29, 2022
Data, model training, and evaluation code for "PubTables-1M: Towards a universal dataset and metrics for training and evaluating table extraction models".

PubTables-1M This repository contains training and evaluation code for the paper "PubTables-1M: Towards a universal dataset and metrics for training a

Microsoft 365 Jan 04, 2023
DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos A concise deep reinforcement learning libr

329 Jan 03, 2023
DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation

DFFNet Paper DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation. Xiangyan Tang, Wenxuan Tu, Keqiu Li, J

4 Sep 23, 2022
A geometric deep learning pipeline for predicting protein interface contacts.

A geometric deep learning pipeline for predicting protein interface contacts.

44 Dec 30, 2022
Self-Learning - Books Papers, Courses & more I have to learn soon

Self-Learning This repository is intended to be used for personal use, all rights reserved to respective owners, please cite original authors and ask

Achint Chaudhary 968 Jan 02, 2022
This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the

1 Dec 24, 2021
Yet another video caption

Yet another video caption

Fan Zhimin 5 May 26, 2022
Character-Input - Create a program that asks the user to enter their name and their age

Character-Input Create a program that asks the user to enter their name and thei

PyLaboratory 0 Feb 06, 2022
Boosted neural network for tabular data

XBNet - Xtremely Boosted Network Boosted neural network for tabular data XBNet is an open source project which is built with PyTorch which tries to co

Tushar Sarkar 175 Jan 04, 2023
Irrigation controller for Home Assistant

Irrigation Unlimited This integration is for irrigation systems large and small. It can offer some complex arrangements without large and messy script

Robert Cook 176 Jan 02, 2023
Entity-Based Knowledge Conflicts in Question Answering.

Entity-Based Knowledge Conflicts in Question Answering Run Instructions | Paper | Citation | License This repository provides the Substitution Framewo

Apple 35 Oct 19, 2022
Multi-Output Gaussian Process Toolkit

Multi-Output Gaussian Process Toolkit Paper - API Documentation - Tutorials & Examples The Multi-Output Gaussian Process Toolkit is a Python toolkit f

GAMES 113 Nov 25, 2022
YOLOX-CondInst - Implement CondInst which is a instances segmentation method on YOLOX

YOLOX CondInst -- YOLOX 实例分割 前言 本项目是自己学习实例分割时,复现的代码. 通过自己编程,让自己对实例分割有更进一步的了解。 若想

DDGRCF 16 Nov 18, 2022
The codes of paper 'Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees'

Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees This project contains the codes of pap

0 Apr 20, 2022
An efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning"

MMGEN-FaceStylor English | 简体中文 Introduction This repo is an efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits

OpenMMLab 182 Dec 27, 2022
Attack classification models with transferability, black-box attack; unrestricted adversarial attacks on imagenet

Attack classification models with transferability, black-box attack; unrestricted adversarial attacks on imagenet, CVPR2021 安全AI挑战者计划第六期:ImageNet无限制对抗攻击 决赛第四名(team name: Advers)

51 Dec 01, 2022
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

WaveGrad2 - PyTorch Implementation PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Status (202

Keon Lee 59 Dec 06, 2022
A curated list of references for MLOps

A curated list of references for MLOps

Larysa Visengeriyeva 9.3k Jan 07, 2023