Multi-modal Content Creation Model Training Infrastructure including the FACT model (AI Choreographer) implementation.

Last update: Dec 30, 2022

Related tags

Deep Learning mint

Overview

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ [ICCV-2021].

Overview

This package contains the model implementation and training infrastructure of our AI Choreographer.

Get started

Pull the code

git clone https://github.com/liruilong940607/mint --recursive

Note here --recursive is important as it will automatically clone the submodule (orbit) as well.

Install dependencies

conda create -n mint python=3.7
conda activate mint
conda install protobuf numpy
pip install tensorflow absl-py tensorflow-datasets librosa

sudo apt-get install libopenexr-dev
pip install --upgrade OpenEXR
pip install tensorflow-graphics tensorflow-graphics-gpu

git clone https://github.com/arogozhnikov/einops /tmp/einops
cd /tmp/einops/ && pip install . -U

git clone https://github.com/google/aistplusplus_api /tmp/aistplusplus_api
cd /tmp/aistplusplus_api && pip install -r requirements.txt && pip install . -U

Note if you meet environment conflicts about numpy, you can try with pip install numpy==1.20.

Get the data

See the website

Get the checkpoint

Download from google drive here, and put them to the folder ./checkpoints/

Run the code

complie protocols

protoc ./mint/protos/*.proto

preprocess dataset into tfrecord

python tools/preprocessing.py \
    --anno_dir="/mnt/data/aist_plusplus_final/" \
    --audio_dir="/mnt/data/AIST/music/" \
    --split=train
python tools/preprocessing.py \
    --anno_dir="/mnt/data/aist_plusplus_final/" \
    --audio_dir="/mnt/data/AIST/music/" \
    --split=testval

run training

python trainer.py --config_path ./configs/fact_v5_deeper_t10_cm12.config --model_dir ./checkpoints

Note you might want to change the batch_size in the config file if you meet OUT-OF-MEMORY issue.

run testing and evaluation

# caching the generated motions (seed included) to `./outputs`
python evaluator.py --config_path ./configs/fact_v5_deeper_t10_cm12.config --model_dir ./checkpoints
# calculate FIDs
python tools/calculate_scores.py

Citation

@inproceedings{li2021dance,
  title={AI Choreographer: Music Conditioned 3D Dance Generation with AIST++},
  author={Ruilong Li and Shan Yang and David A. Ross and Angjoo Kanazawa},
  booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
  year = {2021}
}

Multi-modal Content Creation Model Training Infrastructure including the FACT model (AI Choreographer) implementation.

Related tags

Overview

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ [ICCV-2021].

Overview

Get started

Pull the code

Install dependencies

Get the data

Get the checkpoint

Run the code

Citation

Owner

Google Research

These are the materials for the paper "Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations"

Source code for Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

A python library to artfully visualize Factorio Blueprints and an interactive web demo for using it.

Canonical Capsules: Unsupervised Capsules in Canonical Pose (NeurIPS 2021)

This is the code for "HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields".

You Only Look Once for Panopitic Driving Perception

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

PyTorch version implementation of DORN

Neural Surface Maps

Anagram Generator in Python

YOLOv3 in PyTorch > ONNX > CoreML > TFLite

Pure python implementations of popular ML algorithms.

Landmarks Recogntion Web application using Streamlit.

An experimental technique for efficiently exploring neural architectures.

A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation

3D-Transformer: Molecular Representation with Transformer in 3D Space

STRIVE: Scene Text Replacement In Videos

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization