Symbolic Music Generation with Diffusion Models

Last update: Jan 07, 2023

Related tags

Deep Learning symbolic-music-diffusion

Overview

Symbolic Music Generation with Diffusion Models

Supplementary code release for our work Symbolic Music Generation with Diffusion Models.

Installation

All code is written in Python 3 (Anaconda recommended). To install the dependencies:

pip install -r requirements.txt

A copy of the Magenta codebase is required for access to MusicVAE and related components. Installation instructions can be found on the Magenta public repository. You will also need to download pretrained MusicVAE checkpoints. For our experiments, we use the 2-bar melody model.

Datasets

We use the Lakh MIDI Dataset to train our models. Follow these instructions to download and build the Lakh MIDI Dataset.

To encode the Lakh dataset with MusicVAE, use scripts/generate_song_data_beam.py:

python scripts/generate_song_data_beam.py \
  --checkpoint=/path/to/musicvae-ckpt \
  --input=/path/to/lakh_tfrecords \
  --output=/path/to/encoded_tfrecords

To preprocess and generate fixed-length latent sequences for training diffusion and autoregressive models, refer to scripts/transform_encoded_data.py:

python scripts/transform_encoded_data.py \
  --encoded_data=/path/to/encoded_tfrecords \
  --output_path =/path/to/preprocess_tfrecords \
  --mode=sequences \
  --context_length=32

Training

Diffusion

python train_ncsn.py --flagfile=configs/ddpm-mel-32seq-512.cfg

TransformerMDN

python train_mdn.py --flagfile=configs/mdn-mel-32seq-512.cfg

Sampling and Generation

Diffusion

python sample_ncsn.py \
  --flagfile=configs/ddpm-mel-32seq-512.cfg \
  --sample_seed=42 \
  --sample_size=1000 \
  --sampling_dir=/path/to/latent-samples

TransformerMDN

python sample_ncsn.py \
  --flagfile=configs/mdn-mel-32seq-512.cfg \
  --sample_seed=42 \
  --sample_size=1000 \
  --sampling_dir=/path/to/latent-samples

Decoding sequences

To convert sequences of embeddings (generated by diffusion or TransformerMDN models) to sequences of MIDI events, refer to scripts/sample_audio.py.

python scripts/sample_audio.py
  --input=/path/to/latent-samples/[ncsn|mdn] \
  --output=/path/to/audio-midi \
  --n_synth=1000 \
  --include_wav=True

Citing

If you use this code please cite it as:

@inproceedings{
  mittal2021symbolicdiffusion,
  title={Symbolic Music Generation with Diffusion Models},
  author={Gautam Mittal and Jesse Engel and Curtis Hawthorne and Ian Simon},
  booktitle={Proceedings of the 22nd International Society for Music Information Retrieval Conference},
  year={2021},
  url={https://archives.ismir.net/ismir2021/paper/000058.pdf}
}

Note

This is not an official Google product.

Symbolic Music Generation with Diffusion Models

Related tags

Overview

Symbolic Music Generation with Diffusion Models

Installation

Datasets

Training

Diffusion

TransformerMDN

Sampling and Generation

Diffusion

TransformerMDN

Decoding sequences

Citing

Note

Owner

Magenta

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

A Python Package For System Identification Using NARMAX Models

Multi-Objective Reinforced Active Learning

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

Transformer in Computer Vision

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Quantify the difference between two arbitrary curves in space

Hcpy - Interface with Home Connect appliances in Python

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Implementation for Homogeneous Unbalanced Regularized Optimal Transport

E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

Code for Recurrent Mask Refinement for Few-Shot Medical Image Segmentation (ICCV 2021).

PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Distributional Sliced-Wasserstein distance code

Code for ICE-BeeM paper - NeurIPS 2020

[CVPR 2021] "Multimodal Motion Prediction with Stacked Transformers": official code implementation and project page.

JORLDY an open-source Reinforcement Learning (RL) framework provided by KakaoEnterprise