An evaluation toolkit for voice conversion models.

Last update: Aug 29, 2022

Overview

Voice-conversion-evaluation

An evaluation toolkit for voice conversion models.

Sample test pair

Generate the metadata for evaluating models.
The directory of parsers contains several available corpus parsers.

  python sampler.py [name of source corpus] [path of source dir] [name of target corpus] [path of target dir] -n [number of samples] -nt [number of target utterances] -o [path of output dir]

The pairs of metadata are sorted by src_second for long to short.
The metadata contains:

source_corpus: The name of the source corpus.
source_corpus_speaker_number: The number of speaker in source corpus.
source_random_seed: Random seed used for sampling source utterance.
target_corpus: The name of the target corpus.
target_corpus_speaker_number: The number of speaker in target corpus.
target_random_seed: Random seed used for sampling target utterances.
n_samples: number of samples
n_target_samples: number of target utterances
pairs: List of evaluating pairs
- source_speaker: The name of the source speaker.
- target_speaker: The name of the target speaker.
- src_utt: The relative path of the source utterance, which is relative to the source dir.
- tgt_utts: List of the relative path of target utterances, which is relative to the target dir.
- content: The content of the source utterance.
- src_second: The second of the source utterance.
- converted: The entry does not appear when use sampler, you need to add the relative path for your converted output.

Metrics

The metrics include automatic mean opinion score assessment, character error rate, and speaker verification acceptance rate.

Automatic mean opinion score assessment
- Ensemble several MBNet which is implemented by sky1456723.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/mean_opinion_score
```
Character error rate:
- Use the automatic speech recognition model provided by Hugging Face.
- The word error rate on Librispeech test-other is 3.9.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/character_error_rate
```
Speaker verification acceptance rate:
- You can calculate the threshold by metrics/speaker_verification/equal_error_rate/.
- And some pre-calculated thresholds are in metrics/speaker_verification/equal_error_rate/threshold.yaml.
```
  python calculate_objective_metric.py -d [data_dir] -r metrics/speaker_verification -t [target_dir] -th [threshold path]
```

Installation, test and evaluation of Scribosermo speech-to-text engine

Scribosermo STT Setup Scribosermo is a LGPL licensed, open-source speech recognition engine to "Train fast Speech-to-Text networks in different langua

3 Jun 20, 2022

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

GCRC GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Eva

5 Nov 4, 2022

Common Voice Dataset explorer

Common Voice Dataset Explorer Common Voice Dataset is by Mozilla Made during huggingface finetuning week Usage pip install -r requirements.txt streaml

22 Nov 16, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

103 Dec 23, 2022

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统，包含语音编码器、语音合成器、声码器和可视化模块。

6 Nov 8, 2022

Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

25.6k Jan 6, 2023

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Speech Separation The simple project to separate mixed voice (2 clean voices) to 2 separate voices. Result Example (Clisk to hear the voices): mix ||

31 Oct 30, 2022

Every Google, Azure & IBM text to speech voice for free

TTS-Grabber Quick thing i made about a year ago to download any text with any tts voice, over 630 voices to choose from currently. It will split the i

16 Dec 7, 2022

An evaluation toolkit for voice conversion models.

Related tags

Overview

Voice-conversion-evaluation

Sample test pair

Metrics

You might also like...

Installation, test and evaluation of Scribosermo speech-to-text engine

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Common Voice Dataset explorer

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Clone a voice in 5 seconds to generate arbitrary speech in real-time

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Every Google, Azure & IBM text to speech voice for free

Releases(checkpoints)

checkpoints(May 17, 2021)

Owner

This repo contains simple to use, pretrained/training-less models for speaker diarization.

Machine translation models released by the Gourmet project

The ibet-Prime security token management system for ibet network.

Natural Language Processing Tasks and Examples.

Knowledge Management for Humans using Machine Learning & Tags

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

Code for Emergent Translation in Multi-Agent Communication

Twitter-Sentiment-Analysis - Analysis of twitter posts' positive and negative score.

A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

🌐 Translation microservice powered by AI

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

Code Generation using a large neural network called GPT-J

Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Concept Modeling: Topic Modeling on Images and Text

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Simple bots or Simbots is a library designed to create simple bots using the power of python. This library utilises Intent, Entity, Relation and Context model to create bots .