Sound Event Detection with FilterAugment

Overview

Sound Event Detection with FilterAugment

Official implementation of

  • Heavily Augmented Sound Event Detection utilizing Weak Predictions (DCASE2021 Challenge Task 4 technical report)
    by Hyeonuk Nam, Byeong-Yun Ko, Gyeong-Tae Lee, Seong-Hu Kim, Won-Ho Jung, Sang-Min Choi, Yong-Hwa Park
    DCASE arXiv
    - arXiv version has updates on some minor errors

  • FilterAugment: An Acoustic Environmental Data Augmentation Method (Submitted to ICASSP 2022)
    by Hyeonuk Nam, Seong-Hu Kim, Yong-Hwa Park
    arXiv

    • Implementation for 2nd paper that includes updated version of FilterAugment is incomplete for now. It will be updated soon!

Ranked on [3rd place] in IEEE DCASE 2021 Task 4.

FilterAugment

Filter Augment is an audio data augmentation method newly proposed on the above papers for training acoustic models in audio/speech tasks. It applies random weights on randomly selected frequency bands. For more details, refer to the papers mentioned above.

  • This example shows two types of FilterAugment applied on log mel spectrogram of a 10-second audio clip. (a) shows original log mel spectrogram, (b) shows log mel spectrogram applied by step type FilterAugment (c) shows log mel spectrogram applied by linear type Filter Augment.
  • Applied filters are shown below. Filter (d) is applied on (a) to result in (b), and filter (e) is applied on (a) to result in (c)











  • Step type FilterAugment shows several frequency bands that are uniformly increased or decreased in amplitude, while linear type FilterAugment shows continous filter that shows certain peaks and dips.
  • On our participation on DCASE2021 challenge task 4, we used prototype FilterAugment which is step type FilterAugment without hyperparameter minimum bandwith. The code for this prototype is defiend as "filt_aug_dcase" at utils/data_aug.py @ line 107
  • Code for updated FilterAugment including step and linear type for ICASSP submission is defiend as "filt_aug_icassp" at utils/data_aug.py @ line 126

Requirements

Python version of 3.7.10 is used with following libraries

  • pytorch==1.8.0
  • pytorch-lightning==1.2.4
  • pytorchaudio==0.8.0
  • scipy==1.4.1
  • pandas==1.1.3
  • numpy==1.19.2

other requrements in requirements.txt

Datasets

You can download datasets by reffering to DCASE 2021 Task 4 description page or DCASE 2021 Task 4 baseline. Then, set the dataset directories in config yaml files accordingly. You need DESED real datasets (weak/unlabeled in domain/validation/public eval) and DESED synthetic datasets (train/validation).

Training

You can train and save model in exps folder by running:

python main.py

model settings:

There are 5 configuration files in this repo. Default setting is (ICASSP setting)(./configs/config_icassp.yaml), the optimal linear type FilterAugment described in paper submitted to ICASSP. There are 4 other model settings in DCASE tech report. To train for model 1, 2, 3 or 4 from the DCASE tech report or ICASSP setting, you can run the following code instead.

# for example, to train model 3:
python main.py --confing model3

Results of DCASE settings (model 1~4) on DESED Real Validation dataset:

Model PSDS-scenario1 PSDS-scenario2 Collar-based F1
1 0.408 0.628 49.0%
2 0.414 0.608 49.2%
3 0.381 0.660 31.8%
4 0.052 0.783 19.8%
  • these results are based on train models with single run for each setting

Results of ICASSP settings on DESED Real Validation dataset:

Methods PSDS-scenario1 PSDS-scenario2 Collar-based F1 Intersection-based F1
w/o FiltAug 0.387 0.598 47.7% 70.8%
step FiltAug 0.412 0.634 47.4% 71.2%
linear FiltAug 0.413 0.636 49.0% 73.5%
  • These results are based on max values of each metric for 3 separate runs on each setting (refer to paper for details).

Reference

DCASE 2021 Task 4 baseline

Citation & Contact

If this repository helped your works, please cite papers below!

@techreport{Nam2021,
    Author = "Nam, Hyeonuk and Ko, Byeong-Yun and Lee, Gyeong-Tae and Kim, Seong-Hu and Jung, Won-Ho and Choi, Sang-Min and Park, Yong-Hwa",
    title = "Heavily Augmented Sound Event Detection utilizing Weak Predictions",
    institution = "DCASE2021 Challenge",
    year = "2021",
    month = "June",
}

@article{nam2021filteraugment,
  title={FilterAugment: An Acoustic Environmental Data Augmentation Method},
  author={Hyeonuk Nam and Seoung-Hu Kim and Yong-Hwa Park},
  journal={arXiv preprint arXiv:2107.13260},
  year={2021}
}

Please contact Hyeonuk Nam at [email protected] for any query.

Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

EgoNet Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo inclu

Shichao Li 138 Dec 09, 2022
Implementation of the paper Scalable Intervention Target Estimation in Linear Models (NeurIPS 2021), and the code to generate simulation results.

Scalable Intervention Target Estimation in Linear Models Implementation of the paper Scalable Intervention Target Estimation in Linear Models (NeurIPS

0 Oct 25, 2021
🇰🇷 Text to Image in Korean

KoDALLE Utilizing pretrained language model’s token embedding layer and position embedding layer as DALLE’s text encoder. Background Training DALLE mo

HappyFace 74 Sep 22, 2022
Code for Massive-scale Decoding for Text Generation using Lattices

Massive-scale Decoding for Text Generation using Lattices Jiacheng Xu, Greg Durrett TL;DR: a new search algorithm to construct lattices encoding many

Jiacheng Xu 37 Dec 18, 2022
A library for differentiable nonlinear optimization.

Theseus A library for differentiable nonlinear optimization built on PyTorch to support constructing various problems in robotics and vision as end-to

Meta Research 1.1k Dec 30, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

152 Jan 02, 2023
The CLRS Algorithmic Reasoning Benchmark

Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms.

DeepMind 251 Jan 05, 2023
Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

A Network-Based High-Level Data Classification Algorithm Using Betweenness Centr

Esteban Vilca 3 Dec 01, 2022
This repo is to present various code demos on how to use our Graph4NLP library.

Deep Learning on Graphs for Natural Language Processing Demo The repository contains code examples for DLG4NLP tutorials at NAACL 2021, SIGIR 2021, KD

Graph4AI 143 Dec 23, 2022
[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

SelectionGAN for Guided Image-to-Image Translation CVPR Paper | Extended Paper | Guided-I2I-Translation-Papers Citation If you use this code for your

Hao Tang 424 Dec 02, 2022
SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning

Datasets | Website | Raw Data | OpenReview SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning Christopher

67 Dec 17, 2022
Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing (CVPR 2018).

Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing (CVPR2018) By Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu and J

Zilong Huang 245 Dec 13, 2022
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

SynSense 21 Dec 14, 2022
A large-scale benchmark for co-optimizing the design and control of soft robots, as seen in NeurIPS 2021.

Evolution Gym A large-scale benchmark for co-optimizing the design and control of soft robots. As seen in Evolution Gym: A Large-Scale Benchmark for E

121 Dec 14, 2022
Automated image registration. Registrationimation was too much of a mouthful.

alignimation Automated image registration. Registrationimation was too much of a mouthful. This repo contains the code used for my blog post Alignimat

Ethan Rosenthal 9 Oct 13, 2022
Official implementation of "Synthetic Temporal Anomaly Guided End-to-End Video Anomaly Detection" (ICCV Workshops 2021: RSL-CV).

Official PyTorch implementation of "Synthetic Temporal Anomaly Guided End-to-End Video Anomaly Detection" This is the implementation of the paper "Syn

Marcella Astrid 11 Oct 07, 2022
A PyTorch implementation of PointRend: Image Segmentation as Rendering

PointRend A PyTorch implementation of PointRend: Image Segmentation as Rendering [arxiv] [Official Implementation: Detectron2] This repo for Only Sema

AhnDW 336 Dec 26, 2022
Implementation of Bottleneck Transformer in Pytorch

Bottleneck Transformer - Pytorch Implementation of Bottleneck Transformer, SotA visual recognition model with convolution + attention that outperforms

Phil Wang 621 Jan 06, 2023
(AAAI2020)Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing This repository contains pytorch source code for AAAI2020 oral paper: Grapy-ML

54 Aug 04, 2022
The official implementation of the IEEE S&P`22 paper "SoK: How Robust is Deep Neural Network Image Classification Watermarking".

Watermark-Robustness-Toolbox - Official PyTorch Implementation This repository contains the official PyTorch implementation of the following paper to

49 Dec 19, 2022