Music Classification: Beyond Supervised Learning, Towards Real-world Applications

Last update: Dec 15, 2022

Related tags

Overview

Music Classification: Beyond Supervised Learning, Towards Real-world Applications

About the book

This is a web book written for a tutorial session of the 22nd International Society for Music Information Retrieval Conference, Nov 8-12, 2021, in an online format. The ISMIR conference is the world’s leading research forum on processing, searching, organising and accessing music-related data.

Motivation

Lower the barrier: As deep learning emerges, music classification research has entered a new phase, and many data-driven approaches have been proposed to solve the problem. However, researchers sometimes use jargon in various ways. Also, some implementation details and evaluation methods are ambiguously described in the papers, blocking access to the information without personal contact. These are tremendous obstacles when new researchers want to dive into this fascinating research area. Through this book, we would like to lower the barrier for newcomers and reduce miscommunication between researchers by sharing the secrets.

Cope with data issue: Another issue that we are facing under the deep learning era is the exhaustion of labeled data. Labeling musical attributes requires strong domain knowledge and a significant amount of time for listening; hence expensive. Because of this, deep learning researchers started actively utilizing large-scale unlabeled data. This book introduces the recent advances in semi- and self-supervised learning that enables music classification models to step further beyond supervised learning.

Narrow the gap: Music classification has been applied to solve real-world problems successfully. However, some important procedures and considerations for real-world applications are rarely discussed as research topics. In this book, based on the various industry experiences of the authors, we try our best to raise the awareness of these questions and provide answers and perspectives. We hope this helps academia and industries harmonize better together.

About the authors

Minz Won is a Ph.D candidate at the Music Technology Group (MTG) of Universitat Pompeu Fabra in Barcelona, Spain. His research focus is music representation learning. Along with his academic career, he has put his knowledge into practice with industry internships at Kakao Corp., Naver Corp., Pandora, Adobe, and he recently joined ByteDance as a research scientist. He contributed to the winning entry in the WWW 2018 Challenge: Learning to Recognize Musical Genre.

Janne Spijkervet graduated from the University of Amsterdam in 2021 with her Master's thesis titled "Contrastive Learning of Musical Representations". The paper with the same title was published in 2020 on self-supervised learning on raw audio in music tagging. She has started at ByteDance as a research scientist (2020 - present), developing generative models for music creation. She is also a songwriter and music producer, and explores the design and use of machine learning technology in her music.

Keunwoo Choi is a senior research scientist at ByteDance, developing machine learning products for music recommendation and discovery. He received a Ph.D degree from Queen Mary University of London (c4dm) in 2018. As a researcher, he also has been working at Spotify (2018 - 2020) and several other music companies as well as open-source projects such as Kapre, librosa, and torchaudio. He also writes some music.

Citing this book

@book{musicclassification:book,
	Author = {Minz Won, Janne Spijkervet, and Keunwoo Choi},
	Month = Nov.,
	Publisher = {https://music-classification.github.io/tutorial},
	Title = {Music Classification: Beyond Supervised Learning, Towards Real-world Applications},
	Year = 2021,
	Url = {https://music-classification.github.io/tutorial}
}

Music Classification: Beyond Supervised Learning, Towards Real-world Applications

Related tags

Overview

Music Classification: Beyond Supervised Learning, Towards Real-world Applications

About the book

Motivation

About the authors

Citing this book

Owner

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

Official code for Score-Based Generative Modeling through Stochastic Differential Equations

Supplementary materials for ISMIR 2021 LBD paper "Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes"

Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

This repository holds code and data for our PETS'22 article 'From "Onion Not Found" to Guard Discovery'.

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

[ECCV 2020] XingGAN for Person Image Generation

This is a collection of our NAS and Vision Transformer work.

Functional deep learning

Virtual hand gesture mouse using a webcam

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

Change Detection in SAR Images Based on Multiscale Capsule Network

A hifiasm fork for metagenome assembly using Hifi reads.

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

AAAI-22 paper: SimSR: Simple Distance-based State Representationfor Deep Reinforcement Learning

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.