Code for the paper "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

Related tags

Deep Learninguclser20
Overview

Unsupervised Contrastive Learning of
Sound Event Representations

This repository contains the code for the following paper. If you use this code or part of it, please cite:

Eduardo Fonseca, Diego Ortego, Kevin McGuinness, Noel E. O'Connor, Xavier Serra, "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

arXiv slides poster blog post video

We propose to learn sound event representations using the proxy task of contrasting differently augmented views of sound events, inspired by SimCLR [1]. The different views are computed by:

  • sampling TF patches at random within every input clip,
  • mixing resulting patches with unrelated background clips (mix-back), and
  • other data augmentations (DAs) (RRC, compression, noise addition, SpecAugment [2]).

Our proposed system is illustrated in the figure.

system

Our results suggest that unsupervised contrastive pre-training can mitigate the impact of data scarcity and increase robustness against noisy labels. Please check our paper for more details, or have a quicker look at our slide deck, poster, blog post, or video presentation (see links above).

This repository contains the framework that we used for our paper. It comprises the basic stages to learn an audio representation via unsupervised contrastive learning, and then evaluate the representation via supervised sound event classifcation. The system is implemented in PyTorch.

Dependencies

This framework is tested on Ubuntu 18.04 using a conda environment. To duplicate the conda environment:

conda create --name <envname> --file spec-file.txt

Directories and files

FSDnoisy18k/ includes folders to locate the FSDnoisy18k dataset and a FSDnoisy18k.py to load the dataset (train, val, test), including the data loader for contrastive and supervised training, applying transforms or mix-back when appropriate
config/ includes *.yaml files defining parameters for the different training modes
da/ contains data augmentation code, including augmentations mentioned in our paper and more
extract/ contains feature extraction code. Computes an .hdf5 file containing log-mel spectrograms and associated labels for a given subset of data
logs/ folder for output logs
models/ contains definitions for the architectures used (ResNet-18, VGG-like and CRNN)
pth/ contains provided pre-trained models for ResNet-18, VGG-like and CRNN
src/ contains functions for training and evaluation in both supervised and unsupervised fashion
main_train.py is the main script
spec-file.txt contains conda environment specs

Usage

(0) Download the dataset

Download FSDnoisy18k [3] from Zenodo through the dataset companion site, unzip it and locate it in a given directory. Fix paths to dataset in ctrl section of *.yaml. It can be useful to have a look at the different training sets of FSDnoisy18k: a larger set of noisy labels and a small set of clean data [3]. We use them for training/validation in different ways.

(1) Prepare the dataset

Create an .hdf5 file containing log-mel spectrograms and associated labels for each subset of data:

python extract/wav2spec.py -m test -s config/params_unsupervised_cl.yaml

Use -m with train, val or test to extract features from each subset. All the extraction parameters are listed in params_unsupervised_cl.yaml. Fix path to .hdf5 files in ctrl section of *.yaml.

(2) Run experiment

Our paper comprises three training modes. For convenience, we provide yaml files defining the setup for each of them.

  1. Unsupervised contrastive representation learning by comparing differently augmented views of sound events. The outcome of this stage is a trained encoder to produce low-dimensional representations. Trained encoders are saved under results_models/ using a folder name based on the string experiment_name in the corresponding yaml (make sure to change it).
CUDA_VISIBLE_DEVICES=0 python main_train.py -p config/params_unsupervised_cl.yaml &> logs/output_unsup_cl.out
  1. Evaluation of the representation using a previously trained encoder. Here, we do supervised learning by minimizing cross entropy loss without data agumentation. Currently, we load the provided pre-trained models sitting in pth/ (you can change this in main_train.py, search for select model). We follow two evaluation methods:

    • Linear Evaluation: train an additional linear classifier on top of the pre-trained unsupervised embeddings.

      CUDA_VISIBLE_DEVICES=0 python main_train.py -p config/params_supervised_lineval.yaml &> logs/output_lineval.out
      
    • End-to-end Fine Tuning: fine-tune entire model on two relevant downstream tasks after initializing with pre-trained weights. The two downstream tasks are:

      • training on the larger set of noisy labels and validate on train_clean. This is chosen by selecting train_on_clean: 0 in the yaml.
      • training on the small set of clean data (allowing 15% for validation). This is chosen by selecting train_on_clean: 1 in the yaml.

      After choosing the training set for the downstream task, run:

      CUDA_VISIBLE_DEVICES=0 python main_train.py -p config/params_supervised_finetune.yaml &> logs/output_finetune.out
      

The setup in the yaml files should provide the best results reported in our paper. JFYI, the main flags that determine the training mode are downstream, lin_eval and method in the corresponding yaml (they are already adequately set in each yaml).

(3) See results:

Check the logs/*.out for printed results at the end. Main evaluation metric is balanced (macro) top-1 accuracy. Trained models are saved under results_models/models* and some metrics are saved under results_models/metrics*.

Model Zoo

We provide pre-trained encoders as described in our paper, for ResNet-18, VGG-like and CRNN architectures. See pth/ folder. Note that better encoders could likely be obtained through a more exhaustive exploration of the data augmentation compositions, thus defining a more challenging proxy task. Also, we trained on FSDnoisy18k due to our limited compute resources at the time, yet this framework can be directly applied to other larger datasets such as FSD50K or AudioSet.

Citation

@inproceedings{fonseca2021unsupervised,
  title={Unsupervised Contrastive Learning of Sound Event Representations},
  author={Fonseca, Eduardo and Ortego, Diego and McGuinness, Kevin and O'Connor, Noel E. and Serra, Xavier},
  booktitle={2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2021},
  organization={IEEE}
}

Contact

You are welcome to contact [email protected] should you have any question/suggestion. You can also create an issue.

Acknowledgment

This work is a collaboration between the MTG-UPF and Dublin City University's Insight Centre. This work is partially supported by Science Foundation Ireland (SFI) under grant number SFI/15/SIRG/3283 and by the Young European Research University Network under a 2020 mobility award. Eduardo Fonseca is partially supported by a Google Faculty Research Award 2018. The authors are grateful for the GPUs donated by NVIDIA.

References

[1] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” in Int. Conf. on Mach. Learn. (ICML), 2020

[2] Park et al., SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. InterSpeech 2019

[3] E. Fonseca, M. Plakal, D. P. W. Ellis, F. Font, X. Favory, X. Serra, "Learning Sound Event Classifiers from Web Audio with Noisy Labels", In proceedings of ICASSP 2019, Brighton, UK

Owner
Eduardo Fonseca
Returning research intern at Google Research | PhD candidate at Music Technology Group, Universitat Pompeu Fabra
Eduardo Fonseca
Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

UNITE and UNITE+ Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021) Unbalanced Intrinsic Feature Transport for Exemplar-bas

Fangneng Zhan 183 Nov 09, 2022
This is an easy python software which allows to sort images with faces by gender and after by age.

Gender-age Classifier This is an easy python software which allows to sort images with faces by gender and after by age. Usage First install Deepface

Claudio Ciccarone 6 Sep 17, 2022
Aspect-Sentiment-Multiple-Opinion Triplet Extraction (NLPCC 2021)

The code and data for the paper "Aspect-Sentiment-Multiple-Opinion Triplet Extraction" Requirements Python 3.6.8 torch==1.2.0 pytorch-transformers==1.

慢半拍 5 Jul 02, 2022
Few-shot NLP benchmark for unified, rigorous eval

FLEX FLEX is a benchmark and framework for unified, rigorous few-shot NLP evaluation. FLEX enables: First-class NLP support Support for meta-training

AI2 85 Dec 03, 2022
A Python type explainer!

typesplainer A Python typehint explainer! Available as a cli, as a website, as a vscode extension, as a vim extension Usage First, install the package

Typesplainer 79 Dec 01, 2022
Group Fisher Pruning for Practical Network Compression(ICML2021)

Group Fisher Pruning for Practical Network Compression (ICML2021) By Liyang Liu*, Shilong Zhang*, Zhanghui Kuang, Jing-Hao Xue, Aojun Zhou, Xinjiang W

Shilong Zhang 129 Dec 13, 2022
TrTr: Visual Tracking with Transformer

TrTr: Visual Tracking with Transformer We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder a

趙 漠居(Zhao, Moju) 66 Dec 27, 2022
Pytorch-Swin-Unet-V2 - a modified version of Swin Unet based on Swin Transfomer V2

Swin Unet V2 Swin Unet V2 is a modified version of Swin Unet arxiv based on Swin

Chenxu Peng 26 Dec 03, 2022
🎁 3,000,000+ Unsplash images made available for research and machine learning

The Unsplash Dataset The Unsplash Dataset is made up of over 250,000+ contributing global photographers and data sourced from hundreds of millions of

Unsplash 2k Jan 03, 2023
Neural network chess engine trained on Gary Kasparov's games.

Neural Chess It's not the best chess engine, but it is a chess engine. Proof of concept neural network chess engine (feed-forward multi-layer perceptr

3 Jun 22, 2022
an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 985 Jan 08, 2023
OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis Overview OpenABC-D is a large-scale labeled dataset generate

NYU Machine-Learning guided Design Automation (MLDA) 31 Nov 22, 2022
This is the face keypoint train code of project face-detection-project

face-key-point-pytorch 1. Data structure The structure of landmarks_jpg is like below: |--landmarks_jpg |----AFW |------AFW_134212_1_0.jpg |------AFW_

I‘m X 3 Nov 27, 2022
Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

The Picasso Library is intended for complex real-world applications with large-scale surfaces, while it also performs impressively on the small-scale applications over synthetic shape manifolds. We h

97 Dec 01, 2022
Differentiable Factor Graph Optimization for Learning Smoothers @ IROS 2021

Differentiable Factor Graph Optimization for Learning Smoothers Overview Status Setup Datasets Training Evaluation Acknowledgements Overview Code rele

Brent Yi 60 Nov 14, 2022
A dataset for online Arabic calligraphy

Calliar Calliar is a dataset for Arabic calligraphy. The dataset consists of 2500 json files that contain strokes manually annotated for Arabic callig

ARBML 114 Dec 28, 2022
Pytorch implementation of Bert and Pals: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

PyTorch implementation of BERT and PALs Introduction Work by Asa Cooper Stickland and Iain Murray, University of Edinburgh. Code for BERT and PALs; mo

Asa Cooper Stickland 70 Dec 29, 2022
BEAMetrics: Benchmark to Evaluate Automatic Metrics in Natural Language Generation

BEAMetrics: Benchmark to Evaluate Automatic Metrics in Natural Language Generation Installing The Dependencies $ conda create --name beametrics python

7 Jul 04, 2022
HyDiff: Hybrid Differential Software Analysis

HyDiff: Hybrid Differential Software Analysis This repository provides the tool and the evaluation subjects for the paper HyDiff: Hybrid Differential

Yannic Noller 22 Oct 20, 2022
Check out the StyleGAN repo and place it in the same directory hierarchy as the present repo

Variational Model Inversion Attacks Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, Alireza Makhzani Most commands are in run_scripts. W

Jackson Wang 15 Dec 26, 2022