The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Last update: Oct 30, 2022

Related tags

Text Data & NLP speech_separation_PIT

Overview

Speech Separation

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Result Example (Clisk to hear the voices): mix || prediction voice1 || prediction voice2

Mix Spectrogram

Predict Voice1's Spectrogram

Predict Voice2's Spectrogram

1. Quick train

Step 1:

Download LibriMixSmall, extract it and move it to the root of the project.

Step 2:

./train.sh

It will take about ONLY 2-3 HOURS to train with normal GPU. After each epoch, the prediction is generated to ./viz_outout folder.

2. Quick inference

./inference.sh The result will be generated to ./viz_outout folder.

3. More detail

Input: The Complex spectrogram. Get from the raw mixed audio signal
Output: The complex ratio mask (cRM) ---> complex spectrogram ---> separated voices.
Model: Use the simple version of this implementation , which is defined in paper Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
Loss function: Permutation Invariant Training Loss and PairWise Neg SisDr Loss (more SOTA)
Dataset: A small version of LibriMix dataset. I get from LibriMixSmall

4. Current problem

Due to small dataset size for fast training, the model is a bit overfitting to the training set. Use the bigger dataset will potentially help to overcome that. Some suggestions:

Use the original LibriMix Dataset which is way much bigger (around 60 times bigger that what I have trained).
Use this work to download much more in-the-wild dataset and use datasets/VoiceMixtureDataset.py instead of the Libri one that I am using. p/s I have trained and it work too.

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Related tags

Overview

Speech Separation

1. Quick train

Step 1:

Step 2:

2. Quick inference

3. More detail

4. Current problem

Owner

vuthede

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

All the code I wrote for Overwatch-related projects that I still own the rights to.

Arabic speech recognition, classification and text-to-speech.

Malware-Related Sentence Classification

Fastseq 基于ONNXRUNTIME的文本生成加速框架

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

🏖 Easy training and deployment of seq2seq models.

An open source library for deep learning end-to-end dialog systems and chatbots.

✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

Open-World Entity Segmentation

Stuff related to Ben Eater's 8bit breadboard computer

Code for using and evaluating SpanBERT.

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

Sapiens is a human antibody language model based on BERT.

MiCECo - Misskey Custom Emoji Counter

SentAugment is a data augmentation technique for semi-supervised learning in NLP.