The official repository for Audio ALBERT

Related tags

AudioAALBERT
Overview

AALBERT

Here is also the official repository of AALBERT, which is Pytorch lightning reimplementation of the paper, Audio ALBERT: A Lite Bert for Self-Supervised Learning of Audio Representation. The original code is in AlbertNew branch of s3prl repo. In the paper, we proposed Audio ALBERT, which achieves performance comparable with massive pre-trained networks in the downstream tasks while having 91% fewer parameters.

drawingdrawing

Dependencies

  • Python 3.8
  • Computing power (high-end GPU) and memory space (both RAM/GPU's RAM) is extremely important if you'd like to train your own model.
  • Required packages and their use are listed requirements.txt.
  • pip install -r requirements.txt

Pretrain Stage

We use LibriSpeech as our pretraining stage dataset. You can download dataset by this link.

  • Stage 1: modify dataset path to your local dataset path:

    • AALBERT: config path: upstream/aalbert/pretrain_config.yaml
          line 16: datarc:
                  {Your dataset key name}: {your local dataset path}
    • Mockingjay: upstream/mockingjay/pretrain_config.yaml
          line 16: datarc:
                  {Your dataset key name}: {your local dataset path}
  • Stage 2: run pretraining script

    python run_pretrain.py -n aalbert_pretrained -u aalbert

    • -n : experiment_name
    • -u : upstream model: {two option: aalbert / mockingjay}
    • model will save on result folder after finish pretraining stage.

Downstream Stage

Here, we take voxceleb1 speaker classification as our downstream task. You can download dataset from their official website.

After pretraining, We can extract the pretrained model feature on different downstream tasks.

  • Stage 1: modify dataset path to your local dataset path
    • voxceleb1_speaker: config path: downstream/voxceleb1_speaker/train_config.yaml
    line  9: datarc:
    line 10:    file_path: {your dataset folder path}
    line 11:    meta_path: {your label file path}
  • Stage 2: run downstream script
    • voxceleb1_speaker:
      python run_downstream.py \
      -c downstream/voxceleb1_speaker/train_config.yaml \
      -g result/pretrain/{your_pretrained_model_folder}/model_config.yaml  \
      -t result/pretrain/{your_pretrained_model_folder}/pretrained_config.yaml \
      -u aalbert \
      -d voxceleb1_speaker \
      -k result/pretrained/{your pretrained_model_folder}/checkpoints/{checkpoint_you_want_to_use.ckpt} \
      -n voxceleb1_result
    • -n: experiment name
    • -c: downstream training config
    • -g: pretrained model config
    • -t: load pretrained model pretrained config
    • -u: upstream model: {two option: aalbert / mockingjay}
    • -d: downstream task name
    • -k: model checkpoint path
    • -f: finetune pretrained model or not, default=False
Owner
pohan
pohan
ᴀ ʙᴏᴛ ᴛʜᴀᴛ ᴄᴀɴ ᴘʟᴀʏ ᴍᴜꜱɪᴄ ɪɴ ᴛᴇʟᴇɢʀᴀᴍ ɢʀᴏᴜᴘ ᴏɴ ᴠᴏɪᴄᴇ ᴄᴀʟʟ

GJ516 LOVER'S ııllıllı ♥️ ➤⃝Gᴊ516_ᴍᴜꜱɪᴄ_ʙᴏᴛ ♥️ ıllıllı ᴀ ʙᴏᴛ ᴛʜᴀᴛ ᴄᴀɴ ᴘʟᴀʏ ᴍᴜꜱɪᴄ ɪɴ ᴛᴇʟᴇɢʀᴀᴍ ɢʀᴏᴜᴘ ᴏɴ ᴠᴏɪᴄᴇ ᴄᴀʟʟ Requirements 📝 FFmpeg NodeJS nodesou

1 Nov 22, 2021
Python audio and music signal processing library

madmom Madmom is an audio signal processing library written in Python with a strong focus on music information retrieval (MIR) tasks. The library is i

Institute of Computational Perception 1k Dec 26, 2022
Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

ELAN Codes for "Efficient Long-Range Attention Network for Image Super-resolution", arxiv link. Dependencies & Installation Please refer to the follow

xindong zhang 124 Dec 22, 2022
Music player - endlessly plays your music

Music player First, if you wonder about what is supposed to be a music player or what makes a music player different from a simple media player, read

Albert Zeyer 482 Dec 19, 2022
Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Rhet Turnbull 14 Nov 02, 2022
A Music Player Bot for Discord Servers

A Music Player Bot for Discord Servers

Halil Acar 2 Oct 25, 2021
❤️ This Is The EzilaXMusicPlayer Advaced Repo 🎵

Telegram EzilaXMusicPlayer Bot 🎵 A bot that can play music on telegram group's voice Chat ❤️ Requirements 📝 FFmpeg NodeJS nodesource.com Python 3.7+

Sadew Jayasekara 11 Nov 12, 2022
Port Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. / 筆墨クミDeepvocal中文音源

Hitsuboku Kumi (筆墨クミ) is a UTAU virtual singer developed by Cubialpha. This project ports Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. This is the first open-source deepvocal voicebank on Gith

8 Apr 26, 2022
Learn chords with your MIDI keyboard !

miditeach miditeach is a music learning tool that can be used to practice your chords skills with a midi keyboard 🎹 ! Features Midi keyboard input se

Alexis LOUIS 3 Oct 20, 2021
MUSIC-AVQA, CVPR2022 (ORAL)

Audio-Visual Question Answering (AVQA) PyTorch code accompanies our CVPR 2022 paper: Learning to Answer Questions in Dynamic Audio-Visual Scenarios (O

44 Dec 23, 2022
Voice to Text using Raspberry Pi

This module will help to convert your voice (speech) into text using Speech Recognition Library. You can control the devices or you can perform the desired tasks by the word recognition

Raspberry_Pi Pakistan 2 Dec 15, 2021
A music player designed for a University Project.

A music player designed for a University Project. Very flexibe and easy to use, a real life working application with user friendly controls. Hope u enjoy!!

Aditya Johorey 1 Nov 19, 2021
Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Batch Sorting Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files accord

David Mainoo 1 Oct 29, 2021
Python library for handling audio datasets.

AUDIOMATE Audiomate is a library for easy access to audio datasets. It provides the datastructures for accessing/loading different datasets in a gener

Matthias 121 Nov 27, 2022
Python tools for the corpus analysis of popular music.

CATCHY Corpus Analysis Tools for Computational Hook discovery Python tools for the corpus analysis of popular music recordings. The tools can be used

Jan VB 20 Aug 20, 2022
Python I/O for STEM audio files

stempeg = stems + ffmpeg Python package to read and write STEM audio files. Technically, stems are audio containers that combine multiple audio stream

Fabian-Robert Stöter 72 Dec 23, 2022
A tool for retrieving audio in the past

Rewinder A tool for retrieving audio in the past. Ever felt like, I need to remember that discussion which happened 10 min back. Now you can! Rewind a

Bharat 1 Jan 24, 2022
Telegram Voice-Chat Bot Written In Python Using Pyrogram.

Telegram Voice-Chat Bot Telegram Voice-Chat Bot To Play Music From Various Sources In Your Group Support All linux based os. Windows Mac Diagram Requi

TheHamkerCat 314 Dec 29, 2022
PianoPlayer - Automatic fingering generator for piano scores

PianoPlayer - Automatic fingering generator for piano scores

Marco Musy 571 Jan 02, 2023
A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

نافع الهلالي 1 Oct 29, 2021