Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

Overview

CMT

Code for paper Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award)

[Paper] [Site]

Directory Structure

  • src/: code of the whole pipeline

    • train.py: training script, take a npz as input music data to train the model

    • model.py: code of the model

    • gen_midi_conditional.py: inference script, take a npz (represents a video) as input to generate several songs

    • src/video2npz/: convert video into npz by extracting motion saliency and motion speed

  • dataset/: processed dataset for training, in the format of npz

  • logs/: logs that automatically generate during training, can be used to track training process

  • exp/: checkpoints, named after val loss (e.g. loss_13_params.pt)

  • inference/: processed video for inference (.npz), and generated music(.mid)

Preparation

  • clone this repo

  • download lpd_5_prcem_mix_v8_10000.npz from HERE and put it under dataset/

  • download pretrained model loss_8_params.pt from HERE and put it under exp/

  • install ffmpeg=3.2.4

  • prepare a Python3 conda environment

    pip install -r py3_requirements.txt
  • prepare a Python2 conda environment (for extracting visbeat)

    • pip install -r py2_requirements.txt
    • open visbeat package directory (e.g. anaconda3/envs/XXXX/lib/python2.7/site-packages/visbeat), replace the original Video_CV.py with src/video2npz/Video_CV.py

Training

  • If you want to use another training set: convert training data from midi into npz under dataset/

    python midi2numpy_mix.py --midi_dir /PATH/TO/MIDIS/ --out_name data.npz 
  • train the model

    python train.py -n XXX -g 0 1 2 3
    
    # -n XXX: the name of the experiment, will be the name of the log file & the checkpoints directory. if XXX is 'debug', checkpoints will not be saved
    # -l (--lr): initial learning rate
    # -b (--batch_size): batch size
    # -p (--path): if used, load model checkpoint from the given path
    # -e (--epochs): number of epochs in training
    # -t (--train_data): path of the training data (.npz file) 
    # -g (--gpus): ids of gpu
    # other model hyperparameters: modify the source .py files

Inference

  • convert input video (MP4 format) into npz (use the Python2 environment)

    cd src/video2npz
    sh video2npz.sh ../../videos/xxx.mp4
    • try resizing the video if this takes a long time
  • run model to generate .mid :

    python gen_midi_conditional.py -f "../inference/xxx.npz" -c "../exp/loss_8_params.pt"
    
    # -c: checkpoints to be loaded
    # -f: input npz file
    # -g: id of gpu (only one gpu is needed for inference) 
    • if using another training set, change decoder_n_class in gen_midi_conditional to the decoder_n_class in train.py
  • convert midi into audio: use GarageBand (recommended) or midi2audio

    • set tempo to the value of tempo in video2npz/metadata.json
  • combine original video and audio into video with BGM

    ffmpeg -i 'xxx.mp4' -i 'yyy.mp3' -c:v copy -c:a aac -strict experimental -map 0:v:0 -map 1:a:0 'zzz.mp4'
    
    # xxx.mp4: input video
    # yyy.mp3: audio file generated in the previous step
    # zzz.mp4: output video
Owner
Zhaokai Wang
Undergraduate student from Beihang University
Zhaokai Wang
Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)

EPSR (Enhanced Perceptual Super-resolution Network) paper This repo provides the test code, pretrained models, and results on benchmark datasets of ou

Subeesh Vasu 78 Nov 19, 2022
Motion planning environment for Sampling-based Planners

Sampling-Based Motion Planners' Testing Environment Sampling-based motion planners' testing environment (sbp-env) is a full feature framework to quick

Soraxas 23 Aug 23, 2022
Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

Robert Martin 1.3k Dec 29, 2022
Human4D Dataset tools for processing and visualization

HUMAN4D: A Human-Centric Multimodal Dataset for Motions & Immersive Media HUMAN4D constitutes a large and multimodal 4D dataset that contains a variet

tofis 15 Nov 09, 2022
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

peng gao 42 Nov 26, 2022
Attention over nodes in Graph Neural Networks using PyTorch (NeurIPS 2019)

Intro This repository contains code to generate data and reproduce experiments from our NeurIPS 2019 paper: Boris Knyazev, Graham W. Taylor, Mohamed R

Boris Knyazev 242 Jan 06, 2023
MAME is a multi-purpose emulation framework.

MAME's purpose is to preserve decades of software history. As electronic technology continues to rush forward, MAME prevents this important "vintage" software from being lost and forgotten.

Michael Murray 6 Oct 25, 2020
This repository is a basic Machine Learning train & validation Template (Using PyTorch)

pytorch_ml_template This repository is a basic Machine Learning train & validation Template (Using PyTorch) TODO Markdown 사용법 Build Docker 사용법 Anacond

1 Sep 15, 2022
The official repository for "Revealing unforeseen diagnostic image features with deep learning by detecting cardiovascular diseases from apical four-chamber ultrasounds"

Revealing unforeseen diagnostic image features with deep learning by detecting cardiovascular diseases from apical four-chamber ultrasounds The why Im

3 Mar 29, 2022
A simple but complete full-attention transformer with a set of promising experimental features from various papers

x-transformers A concise but fully-featured transformer, complete with a set of promising experimental features from various papers. Install $ pip ins

Phil Wang 2.3k Jan 03, 2023
Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

Official repository of the paper (UAI 2021) "A Variational Approximation for Analyzing the Dynamics of Panel Data", Mixed Effect Neural ODE. Panel dat

Jurijs Nazarovs 7 Nov 26, 2022
A hobby project which includes a hand-gesture based virtual piano using a mobile phone camera and OpenCV library functions

Overview This is a hobby project which includes a hand-gesture controlled virtual piano using an android phone camera and some OpenCV library. My moti

Abhinav Gupta 1 Nov 19, 2021
python library for invisible image watermark (blind image watermark)

invisible-watermark invisible-watermark is a python library and command line tool for creating invisible watermark over image.(aka. blink image waterm

Shield Mountain 572 Jan 07, 2023
Omnidirectional camera calibration in python

Omnidirectional Camera Calibration Key features pure python initial solution based on A Toolbox for Easily Calibrating Omnidirectional Cameras (Davide

Thomas Pönitz 12 Nov 22, 2022
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

openpifpaf Continuously tested on Linux, MacOS and Windows: New 2021 paper: OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Te

VITA lab at EPFL 50 Dec 29, 2022
All of the figures and notebooks for my deep learning book, for free!

"Deep Learning - A Visual Approach" by Andrew Glassner This is the official repo for my book from No Starch Press. Ordering the book My book is called

Andrew Glassner 227 Jan 04, 2023
Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

S2HAND: Model-based 3D Hand Reconstruction via Self-Supervised Learning S2HAND presents a self-supervised 3D hand reconstruction network that can join

Yujin Chen 72 Dec 12, 2022
Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

flownet2-pytorch Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. Multiple GPU training is supported, a

NVIDIA Corporation 2.8k Dec 27, 2022
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

hierarchical-transformer-1d Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers In Progress!! 2021.

MyungHoon Jin 7 Nov 06, 2022
On Out-of-distribution Detection with Energy-based Models

On Out-of-distribution Detection with Energy-based Models This repository contains the code for the experiments conducted in the paper On Out-of-distr

Sven 19 Aug 07, 2022