Audio augmentations library for PyTorch for audio in the time-domain

Last update: Jan 08, 2023

Related tags

Overview

Audio Augmentations

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Usage

We can define several audio augmentations, which will be applied sequentially to a raw audio waveform:

from audio_augmentations import *

audio, sr = torchaudio.load("tests/classical.00002.wav")

num_samples = sr * 5
transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion()], p=0.8),
    RandomApply([Noise(min_snr=0.3, max_snr=0.5)], p=0.3),
    RandomApply([Gain()], p=0.2),
    RandomApply([HighLowPass(sample_rate=sr)], p=0.8),
    RandomApply([Delay(sample_rate=sr)], p=0.5),
    RandomApply([PitchShift(
        n_samples=num_samples,
        sample_rate=sr
    )], p=0.4),
    RandomApply([Reverb(sample_rate=sr)], p=0.3)
]

We can return either one or many versions of the same audio example:

transform = Compose(transforms=transforms)
transformed_audio =  transform(audio)
>> transformed_audio.shape[0] = 1

> transformed_audio.shape[0] = 4 ">

audio = torchaudio.load("testing/classical.00002.wav")
transform = ComposeMany(transforms=transforms, num_augmented_samples=4)
transformed_audio = transform(audio)
>> transformed_audio.shape[0] = 4

Similar to the torchvision.datasets interface, an instance of the Compose or ComposeMany class can be supplied to a torchaudio dataloaders that accept transform=.

Optional

Install WavAugment for reverberation / pitch shifting:

pip install git+https://github.com/facebookresearch/WavAugment

Cite

You can cite this work with the following BibTeX:

@misc{spijkervet_torchaudio_augmentations,
  doi = {10.5281/ZENODO.4748582},
  url = {https://zenodo.org/record/4748582},
  author = {Spijkervet,  Janne},
  title = {Spijkervet/torchaudio-augmentations},
  publisher = {Zenodo},
  year = {2021},
  copyright = {MIT License}
}

Comments

Delay augmentation on cuda

Hi. Currently the delay augmentation doesn't work on gpu since part of the signal is on cpu. I think making thebeginning tensor same as the audio tensor device should fix it. Thanks. https://github.com/Spijkervet/torchaudio-augmentations/blob/d044f9d020e12032ab9280acf5f34a337e72d212/torchaudio_augmentations/augmentations/delay.py#L31

opened by sidml 2
Correctness unit test would be great

For some transforms, we can test if the values are actually correct by manually computing the expected value. For example, PolarityInversion could be test with some tiny tensors like [[0.1, 0.5, -1.0]]. Reverse as well. Probably only those two? Still, it'd be better than not having any.

opened by keunwoochoi 2
Default value of `max_snr` in `Noise`

1.0 of SNR with signal and white noise would be a really heavily corrupted signal. Could we set it to be a little more reasonable value?

Related; it'd be great if one can hear some examples of the augmented result.

opened by keunwoochoi 2
End-to-end PitchShift transform tests

This merge requests adds end-to-end pitch transformation detection with librosa's pYIN pitch detection, to test if the applied transformation yields the expected pitch transposition.

opened by Spijkervet 0
Unittests

This adds various unittests and fixes to multi-channel input for Reverb, Pitch, Reverse and HighLowPass filter augmentations. It also removes Essentia as a dependency, and instead uses julius for IRR filtering.

opened by Spijkervet 0
import error

when i import torchaudio_augmentation

I got the error

RuntimeError : torchaudio.sox_effects.sox_effects.effect_names requires module: torchaudio._torchaudio

how can I deal with it?

opened by EavnJeong 0
Snr db
Hi, Thanks for the interesting work. Allow me to suggest this change for two reasons:

Expressing SNR in dB cancels the doubt there might be between power SNR and RMS SNR.

When sampling an SNR, it feels to me like it makes more sense to uniformly sample from the log scale of the dB than on the linear range. This way you ensure that your low SNR have as much chances as your high SNR.

I hope it makes sens. I'd be glad to discuss further about that.
opened by wesbz 0
sanity check for duration

In transforms where the duration may change, if the input audio is shorter than n_samples, the error message is not intuitive. I forgot but in some case, the multiprocessing-based dataloader silently died. Maybe it's worth checking it somewhere?

opened by keunwoochoi 0
Shapes are still a bit confusing

From ComposeMany.__call__(), is x also a ch, time shape 2-dim tensor? And I'm sure what would be the expected behavior by this function, especially the shape of the output.

opened by keunwoochoi 3

Releases(v0.2.3)

v0.2.3(Nov 15, 2021)
This version:

Removes WavAugment's pitch detection, and instead relies on the parallelisable torch-pitch-shift package.

Adds support for batched inputs (batch, channel, time).

It also adds more tests for every audio transformation.

Source code(tar.gz)
Source code(zip)
0.2.0(Jun 29, 2021)

Removes the Essentia library as a dependency and adds compatibility for multi-channel audio.
Source code(tar.gz)
Source code(zip)
1.0(May 11, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Janne

Music producer, machine learning in MIR & occasional ethical hacker

GitHub Repository

Mina - A Telegram Music Bot 5 mandatory Assistant written in Python using Pyrogram and Py-Tgcalls

3 Feb 07, 2022

A Python wrapper around the Soundcloud API

soundcloud-python A friendly wrapper around the Soundcloud API. Installation To install soundcloud-python, simply: pip install soundcloud Or if you'r

84 Dec 31, 2022

Jarvis From Basic to Advance - make a voice assistant similar to JARVIS (in iron man movie)

JARVIS (Basic to Advance) This was my attempt to make a voice assistant similar to JARVIS (in iron man movie) Let's be honest, it's not as intelligent

17 Dec 25, 2022

Stream Music 🎵 𝘼 𝙗𝙤𝙩 𝙩𝙝𝙖𝙩 𝙘𝙖𝙣 𝙥𝙡𝙖𝙮 𝙢𝙪𝙨𝙞𝙘 𝙤𝙣 𝙏𝙚𝙡𝙚𝙜𝙧𝙖𝙢 𝙂𝙧𝙤𝙪𝙥 𝙖𝙣𝙙 𝘾𝙝𝙖𝙣𝙣𝙚𝙡 𝙑𝙤𝙞𝙘𝙚 𝘾𝙝𝙖𝙩𝙨 𝘼𝙫𝙖𝙞𝙡?

Stream Music 🎵 𝘼 𝙗𝙤𝙩 𝙩𝙝𝙖𝙩 𝙘𝙖𝙣 𝙥𝙡𝙖𝙮 𝙢𝙪𝙨𝙞𝙘 𝙤𝙣 𝙏𝙚𝙡𝙚𝙜𝙧𝙖𝙢 𝙂𝙧𝙤𝙪𝙥 𝙖𝙣𝙙 𝘾𝙝𝙖𝙣𝙣𝙚𝙡 𝙑𝙤𝙞𝙘𝙚 𝘾𝙝𝙖𝙩𝙨 𝘼𝙫𝙖𝙞𝙡?

15 Nov 12, 2022

commonfate 📦commonfate 📦 - Common Fate Model and Transform.

Common Fate Transform and Model for Python This package is a python implementation of the Common Fate Transform and Model to be used for audio source

18 Jan 08, 2022

An Amazon Music client for Linux (unpretentious)

Amusiz An Amazon Music client for Linux (unpretentious) ↗️ Install You can install Amusiz in multiple ways, choose your favorite. 🚀 AppImage Here you

25 Nov 08, 2022

C++ library for audio and music analysis, description and synthesis, including Python bindings

Essentia Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license.

2.3k Jan 03, 2023

Open Sound Strip, Sequence or Record in Audacity

Audacity Tools For Blender Sound editing in Blender Video Sequence Editor with Audacity integrated. Send/receive the full edited sequence or single st

64 Dec 31, 2022

Analyze, visualize and process sound field data recorded by spherical microphone arrays.

Sound Field Analysis toolbox for Python The sound_field_analysis toolbox (short: sfa) is a Python port of the Sound Field Analysis Toolbox (SOFiA) too

69 Nov 23, 2022

DCL - An easy to use diacritic library used for diacritic and accent manipulation.

Diacritics Library This library is used for adding, and removing diacritics from strings. Getting started Start by importing the module: import dcl DC

6 Jun 03, 2022

A music player designed for a University Project.

A music player designed for a University Project. Very flexibe and easy to use, a real life working application with user friendly controls. Hope u enjoy!!

1 Nov 19, 2021

Python audio and music signal processing library

madmom Madmom is an audio signal processing library written in Python with a strong focus on music information retrieval (MIR) tasks. The library is i

1k Dec 26, 2022

Python implementation of the Short Term Objective Intelligibility measure

Python implementation of STOI Implementation of the classical and extended Short Term Objective Intelligibility measures Intelligibility measure which

250 Dec 21, 2022

A python wrapper for REAPER

pyreaper A python wrapper for REAPER (Robust Epoch And Pitch EstimatoR) Installation pip install pyreaper Demonstration notebnook http://nbviewer.jupy

56 Dec 27, 2022

IDing the songs played on the do you radio show

36 Nov 15, 2022

Library for Python 3 to communicate with the Google Chromecast.

pychromecast Library for Python 3.6+ to communicate with the Google Chromecast. It currently supports: Auto discovering connected Chromecasts on the n

2.4k Jan 02, 2023

Open-Source bot to play songs in your Telegram's Group Voice Chat. Powered by @Akki_ThePro

VcPlayer Telegram Voice-Chat Bot [PyTGCalls] ⇝ Requirements ⇜ Account requirements A Telegram account to use as the music bot, You cannot use regular

2 Dec 25, 2021

A Quick Music Player Made Fully in Python

Quick Music Player Made Fully In Python. Pure Python, cross platform, single function module with no dependencies for playing sounds. Installation & S

1 Dec 24, 2021

Play any song directly into your group voice chat.

Telegram VCPlayer Bot Play any song directly into your group voice chat. Official Bot : VCPlayerBot | Discussion Group : VoiceChat Music Player Suppor

50 Nov 21, 2022

:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

SpeechPy Official Project Documentation Table of Contents Documentation Which Python versions are supported Citation How to Install? Local Installatio

870 Dec 27, 2022

Audio augmentations library for PyTorch for audio in the time-domain

Related tags

Overview

Audio Augmentations

Usage

Optional

Cite

Comments

Releases(v0.2.3)

v0.2.3(Nov 15, 2021)

0.2.0(Jun 29, 2021)

1.0(May 11, 2021)

Owner

Janne

Mina - A Telegram Music Bot 5 mandatory Assistant written in Python using Pyrogram and Py-Tgcalls

A Python wrapper around the Soundcloud API

Jarvis From Basic to Advance - make a voice assistant similar to JARVIS (in iron man movie)

Stream Music 🎵 𝘼 𝙗𝙤𝙩 𝙩𝙝𝙖𝙩 𝙘𝙖𝙣 𝙥𝙡𝙖𝙮 𝙢𝙪𝙨𝙞𝙘 𝙤𝙣 𝙏𝙚𝙡𝙚𝙜𝙧𝙖𝙢 𝙂𝙧𝙤𝙪𝙥 𝙖𝙣𝙙 𝘾𝙝𝙖𝙣𝙣𝙚𝙡 𝙑𝙤𝙞𝙘𝙚 𝘾𝙝𝙖𝙩𝙨 𝘼𝙫𝙖𝙞𝙡?

commonfate 📦commonfate 📦 - Common Fate Model and Transform.

An Amazon Music client for Linux (unpretentious)

C++ library for audio and music analysis, description and synthesis, including Python bindings

Open Sound Strip, Sequence or Record in Audacity

Analyze, visualize and process sound field data recorded by spherical microphone arrays.

DCL - An easy to use diacritic library used for diacritic and accent manipulation.

A music player designed for a University Project.

Python audio and music signal processing library

Python implementation of the Short Term Objective Intelligibility measure

A python wrapper for REAPER

IDing the songs played on the do you radio show

Library for Python 3 to communicate with the Google Chromecast.

Open-Source bot to play songs in your Telegram's Group Voice Chat. Powered by @Akki_ThePro

A Quick Music Player Made Fully in Python

Play any song directly into your group voice chat.

:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/