Audio Retrieval with Natural Language Queries: A Benchmark Study

Related tags

Audioaudio-retrieval
Overview

Audio Retrieval with Natural Language Queries: A Benchmark Study

Paper | Project page | Text-to-audio search demo


This repository is the implementation of Audio Retrieval with Natural Language Queries: A Benchmark Study which builds on the Audio Retrieval with Natural Language Queries repository and provides code for downloading the SoundDescs dataset and for reproducing all result from Audio Retrieval with Natural Language Queries: A Benchmark Study. The code is based on the Use What You Have: Video retrieval using representations from collaborative experts and MMT: Multi-modal Transformer for Video Retrieval repositories.

The datasets used in this paper are SoundDescs, AudioCaps, CLOTHO, Activity-Net and QuerYD.

Requirements and datasets

The required libraries for running this code can be found in requirements.txt. Cuda 10.1 and Python 3.7 were used.

conda create --name audio-retrieval python=3.7
conda activate audio-retrieval
pip install -r requirements.txt
conda install -c conda-forge ffmpeg

To be able to run the code below, features extracted from various datasets need to be downloaded. If there is not enough space in your working location to store some of these features (for SoundDescs and AudioCaps the files are larger than 6GB while the others are under 1GB) then you will need to create a folder called data inside this repository which should be a symlink to a folder where enogh memory exists. As an example, run the following from the audio-retrieval-benchmark base directory:

ln -s 
   
     data

   

To download features for the AudioCaps, Clotho, Activity-Net, and QuerYD datasets, follow the steps here. The SoundDescs features can be downloaded analogously:

python3 misc/sync_experts.py --dataset SoundDescs

In case you want to use the raw audio data for the SoundDescs, we explain how to download the SoundDescs dataset below.

SoundDescs dataset download and pre-processing

This is a tool to allow for easy download of audio files and text information from the https://sound-effects.bbcrewind.co.uk/search page.

Downloading audios
First download the download_links_renamed.txt or, if needed, the download_links.txt file. Save it in the folder that will be used for downloading audios. To be able to download the files the --download_folder flag must be set when running the commands below.

To only download a few audio files, use the --limit flag with non-zero values.

To download audio files in zip form for the SoundDescs dataset simply run the line below. To download multiple files at the same time use the processes flag. We recommend not using more than two processes to avoid being blocked by the website.

python sounddescs_download_audios.py --download_folder {location where to save files} --processes 2

To unzip the audio files to a new folder, run the line below. Here a larger number of processes can be used:

python sounddescs_download_audios.py --action unzipping --processes 20 --download_folder {location where to save files}

To re-sample the audio files at 16kHz and be put in the format needed to run CE, MoEE, and MMT, run the following command:

python sounddescs_wavs_transforms.py --exp resample --initial_folder {location where files were saved before} --dest_folder {location where resampled files are stored} --processes 20

Other files available that might prove useful are found in the sounddescs_data folder. The files are:

  • categories.pkl - this file contains tags for most audio files. These tags can be Nature, Clocks, Sport etc. Some files have more than one tag and some have no tags.
  • descriptions.pkl - this file contains the descriptions associated with the audio files. These are used as captions in our CE, MoEE and MMT experiments.
  • extra_info.pkl - this file contains information about the audio content such as file type (e.g. MP3) or sample rate (e.g. 44.1KHz)

Terms and conditions for SoundDescs dataset

To download and use the SoundDescs dataset, you need to comply with the terms and conditions of the RemArc Licence.

This is from the official website that hosts the data:

By continuing, you agree to comply with the terms of the RemArc Licence for this and any future downloads.

Commercial use of this content is not allowed under the RemArc license.

For commercial use, buy the sound effect from Pro Sound Effects which can be found in the More Detail section for each sound effect.

Evaluating pretrained CE, MoEE, and MMT models on multiple seeds and reproducing results

To reproduce results for the CE, MoEE, and MMT models in the tables below, multiple models trained with different seeds need to be downloaded and evaluated on the test sets.

The steps needed to reproduce results are:

  1. Picking the experiment to be reproduced which is in the form - . Tables with experiment names and the corresponding form can be found in misc/exps-names.md.
  2. Downloading the features and splits corresponding to the dataset for which the experiment is run. For example for AudioCaps run:
# fetch the pretrained experts for AudioCaps
python3 misc/sync_experts.py --dataset AudioCaps

Additional examples for the datasets used in this paper can be found in misc/exps-names.md.

  1. Running the eval.py script.

For example, to reproduce the experiments for AudioCaps with complete visual and audio experts, run the following line:

python eval.py --experiment audiocaps-train-full-ce-r2p1d-inst-vggish-vggsound

If the --experiment flag is not provided, the eval.py script will download and evaluate all CE and MoEE models on the test set.

Training a new model

Training a new CE audio-text embedding requires:

  1. The pretrained experts for the dataset used for training, which should be located in /data/ /symlinked-feats (this will be done automatically by the utility script, or can be done manually). Examples can be found in misc/exps-names.md.
  2. A config.json file. You can define your own, or use one of the provided configs in the configs directory.

Training is then performed with the following command:

python3 train.py --config 
   
     --device 
    

    
   

where is the index of the GPU to train on. This option can be ommitted for training on the CPU.

For example, to train a new embedding for the CLOTHO dataset, run the following sequence of commands:

# fetch the pretrained experts for CLOTHO
python3 misc/sync_experts.py --dataset CLOTHO

# Train the model
python3 train.py --config configs/clotho/train-vggish-vggsound.json --device 0

To train MMT, use the following command:

python -m mmt/train.py --config 
   

   

For example, to train MMT on the CLOTHO dataset, run the following sequence of commands:

# fetch the pretrained experts for CLOTHO
python3 misc/sync_experts.py --dataset CLOTHO

# Train MMT on CLOTHO
python -m mmt/train --config mmt/configs/clotho/Clotho_mmt.json

AudioCaps

These are the retrieval results obtained for the AudioCaps dataset when using only audio experts:

Experts Task [email protected] [email protected] [email protected] [email protected] MdR MnR Geom params Links
CE - VGGish t2v 18.5(0.3) 47.4(0.1) 62.0(0.5) 89.3(0.3) 6.0(0.0) 22.7(0.3) 37.9(0.1) 7.39M config, model
CE - VGGish v2t 20.7(1.8) 48.6(0.7) 62.9(0.4) 86.9(0.2) 6.0(0.0) 25.4(1.3) 39.8(1.3) 7.39M config, model
CE - VGGSound t2v 22.4(0.3) 53.9(1.2) 69.2(0.9) 91.4(1.6) 5.0(0.0) 19.9(3.4) 43.7(0.5) 12.12M config, model
CE - VGGSound v2t 27.0(0.9) 57.8(0.3) 72.5(0.7) 92.6(0.3) 4.0(0.0) 17.5(1.8) 48.3(0.7) 12.12M config, model
CE - VGGish + VGGSound t2v 23.6(0.6) 56.2(0.5) 71.4(0.5) 92.3(1.5) 4.0(0.0) 18.3(3.0) 45.6(0.5) 21.86M config, model
CE - VGGish + VGGSound v2t 27.6(1.0) 60.5(0.7) 74.7(0.8) 94.2(0.4) 4.0(0.0) 14.7(1.4) 50.0(0.6) 21.86M config, model
MoEE - VGGish + VGGSound t2v 23.0(0.7) 55.7(0.3) 71.0(1.2) 93.0(0.3) 4.0(0.0) 16.3(0.5) 45.0(0.8) 8.90M config, model
MoEE - VGGish + VGGSound v2t 26.6(0.7) 59.3(1.4) 73.5(1.1) 94.0(0.5) 4.0(0.0) 15.6(0.8) 48.8(0.8) 8.90M config, model
MMT - VGGish + VGGSound t2v 36.1(3.3) 72.0(2.9) 84.5(2.0) 97.6(0.4) 2.3(0.6) 7.5(1.3) 60.3(2.8) 127.08M config, model
MMT - VGGish + VGGSound v2t 39.6(0.2) 76.8(0.9) 86.7(1.8) 98.2(0.4) 2.0(0.0) 6.5(0.5) 64.1(0.5) 127.08M config, model

Using only visual experts for AudioCaps:

Experts Task [email protected] [email protected] [email protected] [email protected] MdR MnR Geom params Links
CE - Scene t2v 6.0(0.0) 22.9(0.5) 35.6(0.8) 70.4(0.6) 19.0(0.0) 69.1(4.6) 16.9(0.3) 7.51M config, model
CE - Scene v2t 6.8(0.6) 22.1(0.9) 31.9(1.3) 62.9(0.3) 26.3(1.4) 121.3(6.8) 16.9(0.8) 7.51M config, model
CE - R2P1D t2v 8.1(0.4) 30.0(0.4) 45.8(0.2) 77.2(0.9) 12.5(0.5) 56.6(4.6) 22.3(0.5) 6.21M config, model
CE - R2P1D v2t 10.7(0.1) 30.4(1.5) 43.4(1.9) 75.0(1.0) 14.3(1.2) 78.2(1.6) 24.2(0.7) 6.21M config, model
CE - Inst t2v 8.2(0.3) 29.7(0.5) 46.2(0.5) 79.2(1.3) 12.0(0.0) 50.4(7.3) 22.4(0.4) 7.38M config, model
CE - Inst v2t 10.1(0.8) 28.0(1.4) 41.3(0.6) 75.8(0.7) 15.0(1.0) 85.8(2.4) 22.7(0.9) 7.38M config, model
CE - Scene + R2P1D t2v 8.6(0.1) 30.9(0.0) 47.4(0.2) 79.1(0.8) 11.3(0.6) 51.2(3.4) 23.3(0.0) 16.07M config, model
CE - Scene + R2P1D v2t 11.6(0.4) 31.5(0.9) 43.5(0.8) 75.8(0.4) 14.8(0.8) 69.9(2.6) 25.1(0.3) 16.07M config, model
CE - Scene + Inst t2v 8.2(0.3) 30.4(0.3) 47.1(0.2) 78.9(1.8) 12.0(0.0) 51.7(8.8) 22.7(0.3) 17.25M config, model
CE - Scene + Inst v2t 10.2(1.2) 29.0(1.5) 41.5(1.3) 74.5(0.2) 15.7(0.6) 83.8(2.9) 23.0(0.6) 17.25M config, model
CE - R2P1D + Inst t2v 9.5(0.6) 33.0(1.0) 50.0(0.5) 81.1(0.9) 10.3(0.6) 45.9(3.8) 25.0(0.8) 15.95M config, model
CE - R2P1D + Inst v2t 11.2(0.1) 31.3(1.5) 45.2(1.9) 77.4(0.7) 13.0(1.0) 68.5(0.7) 25.1(0.8) 15.95M config, model

Visual and audio experts for AudioCaps:

Experts Task [email protected] [email protected] [email protected] [email protected] MdR MnR Geom params Links
CE - R2P1D + Inst + VGGish t2v 24.5(0.8) 59.0(0.6) 74.9(1.0) 94.5(0.7) 4.0(0.0) 14.3(1.2) 47.6(0.7) 23.32M config, model
CE - R2P1D + Inst + VGGish v2t 31.0(2.2) 64.5(1.0) 78.8(1.2) 95.5(0.1) 3.0(0.0) 11.4(0.9) 54.0(1.8) 23.32M config, model
CE - R2P1D + Inst + VGGSound t2v 27.6(0.2) 63.8(0.6) 78.0(0.8) 94.7(0.1) 3.0(0.0) 13.4(0.8) 51.6(0.2) 28.05M config, model
CE - R2P1D + Inst + VGGSound v2t 32.7(0.9) 69.2(1.0) 82.4(0.4) 96.8(0.3) 2.8(0.3) 9.3(0.2) 57.1(0.7) 28.05M config, model
CE - R2P1D + Inst +VGGish + VGGSound t2v 28.0(0.5) 65.3(0.7) 80.4(0.3) 96.0(0.5) 3.0(0.0) 10.8(0.5) 52.8(0.4) 35.43M config, model
CE - R2P1D + Inst +VGGish + VGGSound v2t 35.8(0.6) 70.2(1.6) 83.3(0.6) 98.3(0.4) 2.0(0.0) 7.8(0.5) 59.4(0.4) 35.43M config, model

CLOTHO

Experts Task [email protected] [email protected] [email protected] [email protected] MdR MnR Geom params Links
CE - VGGish t2v 4.0(0.2) 15.0(0.9) 25.4(0.5) 61.4(1.1) 31.7(1.5) 78.2(2.2) 11.5(0.5) 7.39M config, model
CE - VGGish v2t 4.8(0.4) 15.9(1.8) 25.8(1.7) 57.5(2.5) 35.7(2.5) 106.6(5.7) 12.5(1.0) 7.39M config, model
CE - VGGish + VGGSound t2v 6.7(0.4) 21.6(0.6) 33.2(0.3) 69.8(0.3) 22.3(0.6) 58.3(1.1) 16.9(0.2) 21.86M config, model
CE - VGGish + VGGSound v2t 7.0(0.3) 22.7(0.6) 34.6(0.5) 67.9(2.3) 21.3(0.6) 72.6(3.4) 17.7(0.3) 21.86M config, model
MoEE - VGGish + VGGSound t2v 6.0(0.1) 20.8(0.7) 32.3(0.3) 68.5(0.5) 23.0(0.0) 60.2(0.8) 16.0(0.3) 8.90M config, model
MoEE - VGGish + VGGSound v2t 7.2(0.5) 22.1(0.7) 33.2(1.1) 67.4(0.3) 22.7(0.6) 71.8(2.3) 17.4(0.7) 8.90M config, model
MMT - VGGish + VGGSound t2v 6.5(0.6) 21.6(0.7) 32.8(2.1) 66.9(2.0) 23.0(2.6) 67.7(3.1) 16.6(1.1) 127.08M config, model
MMT - VGGish + VGGSound v2t 6.3(0.5) 22.8(1.7) 33.3(2.2) 67.8(1.5) 22.3(1.5) 67.3(2.9) 16.8(1.0) 127.08M config, model

SoundDescs

Experts Task [email protected] [email protected] [email protected] [email protected] MdR MnR Geom params Links
CE - VGGish t2v 25.4(0.6) 53.3(0.3) 64.1(0.3) 81.7(0.4) 4.7(0.6) 83.7(1.9) 44.3(0.3) 7.39M config, model
CE - VGGish v2t 24.2(0.3) 52.3(0.3) 62.5(0.2) 80.9(0.3) 5.0(0.0) 83.6(1.1) 42.9(0.3) 7.39M config, model
CE - VGGish + VGGSound t2v 31.1(0.2) 60.6(0.7) 70.8(0.5) 86.0(0.2) 3.0(0.0) 63.6(2.2) 51.1(0.4) 21.86M config, model
CE - VGGish + VGGSound v2t 30.8(0.8) 60.3(0.3) 69.5(0.1) 85.4(0.2) 3.0(0.0) 63.2(0.6) 50.5(0.4) 21.86M config, model
MoEE - VGGish + VGGSound t2v 30.8(0.7) 60.8(0.3) 70.9(0.5) 85.9(0.6) 3.0(0.0) 62.0(3.8) 51.0(0.6) 8.90M config, model
MoEE - VGGish + VGGSound v2t 30.9(0.3) 60.3(0.4) 70.1(0.3) 85.3(0.6) 3.0(0.0) 61.5(3.2) 50.7(0.3) 8.90M config, model
MMT - VGGish + VGGSound t2v 30.7(0.4) 61.8(1.0) 72.2(0.8) 88.8(0.4) 3.0(0.0) 34.0(0.6) 51.5(0.5) 127.08M config, model
MMT - VGGish + VGGSound v2t 31.4(0.8) 63.2(0.7) 73.4(0.5) 89.0(0.3) 3.0(0.0) 32.5(0.4) 52.6(0.7) 127.08M config, model

Pretraining on SoundDescs, finetuning on AudioCaps

Experts Task [email protected] [email protected] [email protected] [email protected] MdR MnR Geom params Links
CE - VGGish + VGGSound t2v 23.3(0.7) 52.2(0.1) 63.9(0.5) 84.3(0.3) 5.0(0.0) 59.9(1.6) 42.7(0.5) 21.86M config, model
CE - VGGish + VGGSound v2t 22.2(0.4) 51.7(0.3) 63.3(0.3) 83.8(0.4) 5.0(0.0) 59.2(0.5) 41.7(0.2) 21.86M config, model

Pretraining on AudioCaps, finetuning on CLOTHO

Experts Task [email protected] [email protected] [email protected] [email protected] MdR MnR Geom params Links
CE - VGGish + VGGSound t2v 9.1(0.3) 27.4(0.1) 39.7(0.4) 75.0(0.4) 17.0(0.0) 48.6(0.7) 21.5(0.1) 21.86M config, model
CE - VGGish + VGGSound v2t 11.1(1.1) 26.9(0.7) 39.6(1.1) 73.7(0.6) 16.3(0.6) 57.4(1.8) 22.8(1.2) 21.86M config, model

Pretraining on SoundDescs, finetuning on CLOTHO

Experts Task [email protected] [email protected] [email protected] [email protected] MdR MnR Geom params Links
CE - VGGish + VGGSound t2v 6.4(0.5) 21.1(1.2) 32.5(1.7) 69.3(1.4) 22.7(1.5) 57.6(2.3) 16.3(1.0) 21.86M config, model
CE - VGGish + VGGSound v2t 6.1(0.7) 20.1(1.7) 31.4(1.8) 65.9(2.0) 24.7(1.5) 78.1(5.3) 15.7(1.3) 21.86M config, model

Pretraining on AudioCaps, finetuning on SoundDescs

Experts Task [email protected] [email protected] [email protected] [email protected] MdR MnR Geom params Links
CE - VGGish + VGGSound t2v 23.3(0.7) 52.2(0.1) 63.9(0.5) 84.3(0.3) 5.0(0.0) 59.9(1.6) 42.7(0.5) 21.86M config, model
CE - VGGish + VGGSound v2t 22.2(0.4) 51.7(0.3) 63.3(0.3) 83.8(0.4) 5.0(0.0) 59.2(1.3) 41.7(0.2) 21.86M config, model

Visual centric datasets

Experts Task [email protected] [email protected] [email protected] [email protected] MdR MnR Geom params Links
CE - VGGish QuerYD t2v 3.7(0.2) 11.7(0.4) 17.3(0.6) 36.3(0.3) 115.5(5.2) 273.5(6.7) 9.1(0.0) 7.39M config, model
CE - VGGish QuerYD v2t 3.8(0.2) 11.5(0.4) 16.8(0.2) 35.2(0.4) 116.3(2.1) 271.9(5.8) 9.0(0.2) 7.39M config, model
CE - VGGish Activity-Net t2v 1.4(0.1) 5.0(0.1) 8.5(0.2) 22.1(0.9) 312.0(25.6) 765.6(35.8) 3.9(0.1) 7.39M config, model
CE - VGGish Activity-Net v2t 1.1(0.1) 4.5(0.1) 7.9(0.0) 21.6(0.8) 306.3(27.1) 781.7(30.6) 3.4(0.1) 7.39M config, model

More information can be found at our project page: https://www.robots.ox.ac.uk/~vgg/research/audio-retrieval/

References

If you find this code useful, please consider citing [1,2,3,4].

[1]

@inproceedings{Koepke2021,
    author    = {Koepke, A.S. and Oncescu, A.-M. and Henriques, J. and Akata, Z. and Albanie, S.},
    title     = {Audio Retrieval with Natural Language Queries: A Benchmark Study},
    booktitle = {arXiv preprint arXiv:2112.09418},
    year      = {2021}
}

[2]

@inproceedings{Oncescu21a,
    author    = {Oncescu, A.-M. and Koepke, A.S. and Henriques, J. and Akata, Z., Albanie, S.},
    title     = {Audio Retrieval with Natural Language Queries},
    booktitle = {INTERSPEECH},
    year      = {2021}
}

[3]

@inproceedings{Liu2019a,
    author    = {Liu, Y. and Albanie, S. and Nagrani, A. and Zisserman, A.},
    title     = {Use What You Have: Video retrieval using representations from collaborative experts},
    booktitle = {British Machine Vision Conference (BMVC)},
    year      = {2019},
}

[4]

@inproceedings{gabeur2020mmt,
    author    = {Gabeur, V. and Sun, C. and Alahari, K. and Schmid, C.},
    title     = {Multi-modal Transformer for Video Retrieval},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year      = {2020}
}
๐ŸŽต Python sound notifications made easy

chime Python sound notifications made easy. Table of contents Table of contents Motivation Installation Basic usage Theming IPython/Jupyter magic Exce

Max Halford 231 Jan 09, 2023
This Bot can extract audios and subtitles from video files

Send any valid video file and the bot shows you available streams in it that can be extracted!!

TroJanzHEX 56 Nov 22, 2022
digital audio workstation, instrument and effect plugins, wave editor

digital audio workstation, instrument and effect plugins, wave editor

306 Jan 05, 2023
An app made in Python using the PyTube and Tkinter libraries to download videos and MP3 audio.

yt-dl (GUI Edition) An app made in Python using the PyTube and Tkinter libraries to download videos and MP3 audio. How do I download this? Windows: Fi

1 Oct 23, 2021
Telegram Bot to play music in VoiceChat with Channel Support and autostarts Radio.

VCPlayerBot Telegram bot to stream videos in telegram voicechat for both groups and channels. Supports live streams, YouTube videos and telegram media

Abdisamad Omar Mohamed 1 Oct 15, 2021
Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python 2 or 3

tinytag tinytag is a library for reading music meta data of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and Wave files with python Install pip install tinytag

Tom Wallroth 577 Dec 26, 2022
We built this fully functioning Music player in Python. The music player allows you to play/pause and switch to different songs easily.

We built this fully functioning Music player in Python. The music player allows you to play/pause and switch to different songs easily.

1 Nov 19, 2021
Stream Music ๐ŸŽต ๐˜ผ ๐™—๐™ค๐™ฉ ๐™ฉ๐™๐™–๐™ฉ ๐™˜๐™–๐™ฃ ๐™ฅ๐™ก๐™–๐™ฎ ๐™ข๐™ช๐™จ๐™ž๐™˜ ๐™ค๐™ฃ ๐™๐™š๐™ก๐™š๐™œ๐™ง๐™–๐™ข ๐™‚๐™ง๐™ค๐™ช๐™ฅ ๐™–๐™ฃ๐™™ ๐˜พ๐™๐™–๐™ฃ๐™ฃ๐™š๐™ก ๐™‘๐™ค๐™ž๐™˜๐™š ๐˜พ๐™๐™–๐™ฉ๐™จ ๐˜ผ๐™ซ๐™–๐™ž๐™ก?

Stream Music ๐ŸŽต ๐˜ผ ๐™—๐™ค๐™ฉ ๐™ฉ๐™๐™–๐™ฉ ๐™˜๐™–๐™ฃ ๐™ฅ๐™ก๐™–๐™ฎ ๐™ข๐™ช๐™จ๐™ž๐™˜ ๐™ค๐™ฃ ๐™๐™š๐™ก๐™š๐™œ๐™ง๐™–๐™ข ๐™‚๐™ง๐™ค๐™ช๐™ฅ ๐™–๐™ฃ๐™™ ๐˜พ๐™๐™–๐™ฃ๐™ฃ๐™š๐™ก ๐™‘๐™ค๐™ž๐™˜๐™š ๐˜พ๐™๐™–๐™ฉ๐™จ ๐˜ผ๐™ซ๐™–๐™ž๐™ก?

Sadew Jayasekara 15 Nov 12, 2022
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

SpeechPy Official Project Documentation Table of Contents Documentation Which Python versions are supported Citation How to Install? Local Installatio

Amirsina Torfi 870 Dec 27, 2022
Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

Welcome to MARSYAS. MARSYAS is a software framework for rapid prototyping of audio applications, with flexibility and extensibility as primary concer

Marsyas Developers Group 364 Oct 31, 2022
Simple, hackable offline speech to text - using the VOSK-API.

Nerd Dictation Offline Speech to Text for Desktop Linux. This is a utility that provides simple access speech to text for using in Linux without being

Campbell Barton 844 Jan 07, 2023
:notes: Cross-platform music player

Exaile Exaile is a music player with a simple interface and powerful music management capabilities. Features include automatic fetching of album art,

Exaile 327 Dec 19, 2022
Users can transcribe their favorite piano recordings to MIDI files after installation

Users can transcribe their favorite piano recordings to MIDI files after installation

190 Dec 17, 2022
Terminal-based audio-to-text converter

att Terminal-based audio-to-text converter Project description A terminal-based audio-to-text converter written in python, enabling you to convert .wa

Sven Eschlbeck 4 Dec 15, 2022
DaisyXmusic โค A bot that can play music on Telegram Group and Channel Voice Chats

DaisyXmusic โค is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

TeamOfDaisyX 34 Oct 22, 2022
L-SpEx: Localized Target Speaker Extraction

L-SpEx: Localized Target Speaker Extraction The data configuration and simulation of L-SpEx. The code scripts will be released in the future. Data Gen

Meng Ge 20 Jan 02, 2023
A python program to cut longer MP3 files (i.e. recordings of several songs) into the individual tracks.

I'm writing a python script to cut longer MP3 files (i.e. recordings of several songs) into the individual tracks called ReCut. So far there are two

Dรถnerspiess 1 Oct 27, 2021
Bot duniya Music Player

Bot duniya Music Player Requirements ๐Ÿ“ FFmpeg (Latest) NodeJS nodesource.com (NodeJS 17+) Python (3.10+) PyTgCalls (Lastest) 2nd Telegram Account (ne

Aman Vishwakarma 16 Oct 21, 2022
Port Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. / ็ญ†ๅขจใ‚ฏใƒŸDeepvocalไธญๆ–‡้Ÿณๆบ

Hitsuboku Kumi (็ญ†ๅขจใ‚ฏใƒŸ) is a UTAU virtual singer developed by Cubialpha. This project ports Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. This is the first open-source deepvocal voicebank on Gith

8 Apr 26, 2022
An 8D music player made to enjoy Halloween this year!๐Ÿค˜

HAPPY HALLOWEEN buddy! Split Player Hello There! Welcome to SplitPlayer... Supposed To Be A 8DPlayer.... You Decide.... It can play the ordinary audio

Akshat Kumar Singh 1 Nov 04, 2021