The official repository for Audio ALBERT

Related tags

AudioAALBERT
Overview

AALBERT

Here is also the official repository of AALBERT, which is Pytorch lightning reimplementation of the paper, Audio ALBERT: A Lite Bert for Self-Supervised Learning of Audio Representation. The original code is in AlbertNew branch of s3prl repo. In the paper, we proposed Audio ALBERT, which achieves performance comparable with massive pre-trained networks in the downstream tasks while having 91% fewer parameters.

drawingdrawing

Dependencies

  • Python 3.8
  • Computing power (high-end GPU) and memory space (both RAM/GPU's RAM) is extremely important if you'd like to train your own model.
  • Required packages and their use are listed requirements.txt.
  • pip install -r requirements.txt

Pretrain Stage

We use LibriSpeech as our pretraining stage dataset. You can download dataset by this link.

  • Stage 1: modify dataset path to your local dataset path:

    • AALBERT: config path: upstream/aalbert/pretrain_config.yaml
          line 16: datarc:
                  {Your dataset key name}: {your local dataset path}
    • Mockingjay: upstream/mockingjay/pretrain_config.yaml
          line 16: datarc:
                  {Your dataset key name}: {your local dataset path}
  • Stage 2: run pretraining script

    python run_pretrain.py -n aalbert_pretrained -u aalbert

    • -n : experiment_name
    • -u : upstream model: {two option: aalbert / mockingjay}
    • model will save on result folder after finish pretraining stage.

Downstream Stage

Here, we take voxceleb1 speaker classification as our downstream task. You can download dataset from their official website.

After pretraining, We can extract the pretrained model feature on different downstream tasks.

  • Stage 1: modify dataset path to your local dataset path
    • voxceleb1_speaker: config path: downstream/voxceleb1_speaker/train_config.yaml
    line  9: datarc:
    line 10:    file_path: {your dataset folder path}
    line 11:    meta_path: {your label file path}
  • Stage 2: run downstream script
    • voxceleb1_speaker:
      python run_downstream.py \
      -c downstream/voxceleb1_speaker/train_config.yaml \
      -g result/pretrain/{your_pretrained_model_folder}/model_config.yaml  \
      -t result/pretrain/{your_pretrained_model_folder}/pretrained_config.yaml \
      -u aalbert \
      -d voxceleb1_speaker \
      -k result/pretrained/{your pretrained_model_folder}/checkpoints/{checkpoint_you_want_to_use.ckpt} \
      -n voxceleb1_result
    • -n: experiment name
    • -c: downstream training config
    • -g: pretrained model config
    • -t: load pretrained model pretrained config
    • -u: upstream model: {two option: aalbert / mockingjay}
    • -d: downstream task name
    • -k: model checkpoint path
    • -f: finetune pretrained model or not, default=False
Owner
pohan
pohan
Jarvis From Basic to Advance - make a voice assistant similar to JARVIS (in iron man movie)

JARVIS (Basic to Advance) This was my attempt to make a voice assistant similar to JARVIS (in iron man movie) Let's be honest, it's not as intelligent

codesempai 17 Dec 25, 2022
Generating a structured library of .wav samples with Python.

sample-library Scripts for generating a structured sample library with Python Requires Docker about Samples are written to wave files in lib/. Differe

Ben Mangold 1 Nov 11, 2021
Expressive Digital Signal Processing (DSP) package for Python

AudioLazy Development Last release PyPI status Real-Time Expressive Digital Signal Processing (DSP) Package for Python! Laziness and object representa

Danilo de Jesus da Silva Bellini 642 Dec 26, 2022
A python program to cut longer MP3 files (i.e. recordings of several songs) into the individual tracks.

I'm writing a python script to cut longer MP3 files (i.e. recordings of several songs) into the individual tracks called ReCut. So far there are two

Dönerspiess 1 Oct 27, 2021
Dataset and baseline code for the VocalSound dataset (ICASSP2022).

VocalSound: A Dataset for Improving Human Vocal Sounds Recognition Introduction Citing Download VocalSound Dataset Details Baseline Experiment Contact

Yuan Gong 58 Jan 03, 2023
A Python 3 script for capturing and recording a SDR stream to a WAV file (or serving it to a HTTP audio stream).

rfsoapyfile A Python 3 script for capturing and recording a SDR stream to a WAV file (or serving it to a HTTP audio stream). The script is threaded fo

4 Dec 19, 2022
Make an audio file (really) long-winded

longwind Make an audio file (really) long-winded Daily repetitions are an illusion anyway.

Vincent Lostanlen 2 Sep 12, 2022
Voice helper on russian

Voice helper on russian

KreO 1 Jun 30, 2022
🎵 Python sound notifications made easy

chime Python sound notifications made easy. Table of contents Table of contents Motivation Installation Basic usage Theming IPython/Jupyter magic Exce

Max Halford 231 Jan 09, 2023
A simple voice detection system which can be applied practically for designing a device with capability to detect a baby’s cry and automatically turning on music

Auto-Baby-Cry-Detection-with-Music-Player A simple voice detection system which can be applied practically for designing a device with capability to d

2 Dec 15, 2021
Mentos Music Bot With Python

Mentos Music Bot For Any Query Join Our Support Group 👥 Special Thanks - @OfficialYukki Hey Welcome To Here 💫 💫 You Can Make Your Own Music Bot Fo

Cyber Toxic 13 Oct 21, 2022
An audio digital processing toolbox based on a workflow/pipeline principle

AudioTK Audio ToolKit is a set of audio filters. It helps assembling workflows for specific audio processing workloads. The audio workflow is split in

Matthieu Brucher 238 Oct 18, 2022
A tool for retrieving audio in the past

Rewinder A tool for retrieving audio in the past. Ever felt like, I need to remember that discussion which happened 10 min back. Now you can! Rewind a

Bharat 1 Jan 24, 2022
Audio pitch-shifting & re-sampling utility, based on the EMU SP-1200

Pitcher.py Free & OS emulation of the SP-12 & SP-1200 signal chain (now with GUI) Pitch shift / bitcrush / resample audio files Written and tested in

morgan 13 Oct 03, 2022
Cobra is a highly-accurate and lightweight voice activity detection (VAD) engine.

On-device voice activity detection (VAD) powered by deep learning.

Picovoice 88 Dec 16, 2022
Music generation using ml / dl

Data analysis Document here the project: deep_music Description: Project Description Data Source: Type of analysis: Please document the project the be

0 Jul 03, 2022
Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3

tinytag tinytag is a library for reading music meta data of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python Install pip install tinytag

Tom Wallroth 577 Dec 26, 2022
A music player designed for a University Project.

A music player designed for a University Project. Very flexibe and easy to use, a real life working application with user friendly controls. Hope u enjoy!!

Aditya Johorey 1 Nov 19, 2021
Datamoshing with FFmpeg

ffmosher Datamoshing with FFmpeg Drag and drop video onto mosh.bat to create a datamoshed video. To datamosh an image, please ensure the file is in a

18 Sep 11, 2022
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Basic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence La

Spotify 1.4k Jan 01, 2023