Ukrainian TTS (text-to-speech) using Coqui TTS

Overview
title emoji colorFrom colorTo sdk app_file pinned
Ukrainian TTS
🐸
green
green
gradio
app.py
false

Ukrainian TTS πŸ“’ πŸ€–

Ukrainian TTS (text-to-speech) using Coqui TTS.

Trained on M-AILABS Ukrainian dataset using sumska voice.

Link to online demo -> https://huggingface.co/spaces/robinhad/ukrainian-tts

Support

If you like my work, please support -> SUPPORT LINK

Example

test.mp4

How to use :

  1. pip install -r requirements.txt.
  2. Download model from "Releases" tab.
  3. Launch as one-time command:
tts --text "Text for TTS" \
    --model_path path/to/model.pth.tar \
    --config_path path/to/config.json \
    --out_path folder/to/save/output.wav

or alternatively launch web server using:

tts-server --model_path path/to/model.pth.tar \
    --config_path path/to/config.json

How to train:

  1. Refer to "Nervous beginner guide" in Coqui TTS docs.
  2. Instead of provided config.json use one from this repo.

Attribution

Code for app.py taken from https://huggingface.co/spaces/julien-c/coqui

Comments
  • Error with file: speakers.pth

    Error with file: speakers.pth

    FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Soft/Python/mamba1/TTS/vits_mykyta_latest-September-12-2022_12+38AM-829e2c24/speakers.pth'

    opened by akirsoft 4
  • doc: fix examples in README

    doc: fix examples in README

    Problem

    The one-time snippet does not work as is and complains that the speaker is not defined

     > initialization of speaker-embedding layers.
     > Text: ΠŸΠ΅Ρ€Π΅Π²Ρ–Ρ€ΠΊΠ° ΠΌΡ–ΠΊΡ€ΠΎΡ„ΠΎΠ½Π°
     > Text splitted to sentences.
    ['ΠŸΠ΅Ρ€Π΅Π²Ρ–Ρ€ΠΊΠ° ΠΌΡ–ΠΊΡ€ΠΎΡ„ΠΎΠ½Π°']
    Traceback (most recent call last):
      File "/home/serg/.local/bin/tts", line 8, in <module>
        sys.exit(main())
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/bin/synthesize.py", line 350, in main
        wav = synthesizer.tts(
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/utils/synthesizer.py", line 228, in tts
        raise ValueError(
    ValueError:  [!] Look like you use a multi-speaker model. You need to define either a `speaker_name` or a `speaker_wav` to use a multi-speaker model.
    

    Also it speakers.pth should be downloaded.

    Fix

    Just a few documentation changes:

    • make instructions on what to download from Releases more precise
    • add --speaker_id argument with one of the speakers
    opened by seriar 2
  • One vowel words in the end of the sentence aren't stressed

    One vowel words in the end of the sentence aren't stressed

    Input:

    
    Π‘ΠΎΠ±Π΅Ρ€ Π½Π° Π±Π΅Ρ€Π΅Π·Ρ– Π· бобрСнятами Π±ΡƒΠ±Π»ΠΈΠΊΠΈ ΠΏΡ–ΠΊ.
    
    Π‘ΠΎΡ€ΠΎΠ½ΠΈΠ»Π° Π±ΠΎΡ€ΠΎΠ½Π° ΠΏΠΎ Π±ΠΎΡ€ΠΎΠ½ΠΎΠ²Π°Π½ΠΎΠΌΡƒ полю.
    
    Π†ΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΡ–ΠΏ, ΠΊΠΈΠΏΡ–Π² ΠΎΠΊΡ€Ρ–ΠΏ, ΠΏΡ€ΠΈΠΉΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΡ–ΠΏ - ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€Ρ–ΠΏ, як ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΡ–, Ρ‚Π°ΠΊ Ρ– ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΡ– Ρ– ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠ΅Π½ΡΡ‚Π°Ρ….
    
    Π‘ΠΈΠ΄ΠΈΡ‚ΡŒ ΠŸΡ€ΠΎΠΊΠΎΠΏ β€” ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ, ΠŸΡ–ΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΠΎΠΏ β€” ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ. Π―ΠΊ ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠΎΠ²Ρ– ΠΊΠΈΠΏΡ–Π² ΠΎΠΊΡ€ΠΎΠΏ, Π’Π°ΠΊ Ρ– Π±Π΅Π· ΠŸΡ€ΠΎΠΊΠΎΠΏΠ° ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ.
    

    Result:

    
    Π‘ΠΎΠ±+Π΅Ρ€ Π½+Π° Π±Π΅Ρ€Π΅Π·Ρ– Π· Π±ΠΎΠ±Ρ€Π΅Π½+ятами Π±+ΡƒΠ±Π»ΠΈΠΊΠΈ ΠΏΡ–ΠΊ.
    
    Π‘ΠΎΡ€ΠΎΠ½+ΠΈΠ»Π° Π±ΠΎΡ€ΠΎΠ½+Π° ΠΏ+ΠΎ Π±ΠΎΡ€ΠΎΠ½+ΠΎΠ²Π°Π½ΠΎΠΌΡƒ ΠΏ+олю.
    
    Π†Ρˆ+ΠΎΠ² ΠŸΡ€+ΠΎΠΊΡ–ΠΏ, ΠΊΠΈΠΏ+Ρ–Π² ΠΎΠΊΡ€+Ρ–ΠΏ, ΠΏΡ€ΠΈΠΉΡˆ+ΠΎΠ² ΠŸΡ€+ΠΎΠΊΡ–ΠΏ - ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€+Ρ–ΠΏ, +як ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΡ–, Ρ‚+Π°ΠΊ +Ρ– ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΡ– +Ρ– ΠΏΡ€+ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠ΅Π½ΡΡ‚Π°Ρ….
    
    Π‘ΠΈΠ΄+ΠΈΡ‚ΡŒ ΠŸΡ€ΠΎΠΊ+ΠΎΠΏ β€” ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ, ΠŸΡ–Ρˆ+ΠΎΠ² ΠŸΡ€ΠΎΠΊ+ΠΎΠΏ β€” ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ. +Π―ΠΊ ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΠΎΠ²Ρ– ΠΊΠΈΠΏ+Ρ–Π² ΠΎΠΊΡ€ΠΎΠΏ, Π’+Π°ΠΊ +Ρ– Π±+Π΅Π· ΠŸΡ€+ΠΎΠΊΠΎΠΏΠ° ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ.```
    opened by robinhad 0
  • Error import StressOption

    Error import StressOption

    Traceback (most recent call last): File "/home/user/Soft/Python/mamba1/test.py", line 1, in from ukrainian_tts.tts import TTS, Voices, StressOption ImportError: cannot import name 'StressOption' from 'ukrainian_tts.tts'

    opened by akirsoft 0
  • Vits improvements

    Vits improvements

    vitsArgs = VitsArgs(
        # hifi V3
        resblock_type_decoder = '2',
        upsample_rates_decoder = [8,8,4],
        upsample_kernel_sizes_decoder = [16,16,8],
        upsample_initial_channel_decoder = 256,
        resblock_kernel_sizes_decoder = [3,5,7],
        resblock_dilation_sizes_decoder = [[1,2], [2,6], [3,12]],
    )
    
    opened by robinhad 0
  • Model improvement checklist

    Model improvement checklist

    • [x] Add Ukrainian accentor - https://github.com/egorsmkv/ukrainian-accentor
    • [ ] Fine-tune from existing checkpoint (e.g. VITS Ljspeech)
    • [ ] Try to increase fft_size, hop_length to match sample_rate accordingly
    • [ ] Include more dataset samples into model
    opened by robinhad 0
Releases(v4.0.0)
  • v4.0.0(Dec 10, 2022)

  • v3.0.0(Sep 14, 2022)

    This is a release of Ukrainian TTS model and checkpoint. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 280 000 steps by @robinhad . Kudos to @egorsmkv for providing dataset for this model. Kudos to @proger for providing alignment scripts. Kudos to @dchaplinsky for Dmytro voice.

    Example:

    Test sentence:

    К+Π°ΠΌ'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΈΠΉ - ΠΌ+істо Π² Π₯мСльн+ΠΈΡ†ΡŒΠΊΡ–ΠΉ +області Π£ΠΊΡ€Π°+Ρ—Π½ΠΈ, Ρ†+Π΅Π½Ρ‚Ρ€ Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΡ— ΠΌΡ–ΡΡŒΠΊ+ΠΎΡ— ΠΎΠ±'+Ρ”Π΄Π½Π°Π½ΠΎΡ— Ρ‚Π΅Ρ€ΠΈΡ‚ΠΎΡ€Ρ–+Π°Π»ΡŒΠ½ΠΎΡ— Π³Ρ€ΠΎΠΌ+Π°Π΄ΠΈ +Ρ– Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΠ³ΠΎ Ρ€Π°ΠΉ+ΠΎΠ½Ρƒ.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/190852232-34956a1d-77a9-42b9-b96d-39d0091e3e34.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/190852238-366782c1-9472-45fc-8fea-31346242f927.mp4

    Dmytro (male):

    https://user-images.githubusercontent.com/5759207/190852251-db105567-52ba-47b5-8ec6-5053c3baac8c.mp4

    Olha (female):

    https://user-images.githubusercontent.com/5759207/190852259-c6746172-05c4-4918-8286-a459c654eef1.mp4

    Lada (female):

    https://user-images.githubusercontent.com/5759207/190852270-7aed2db9-dc08-4a9f-8775-07b745657ca1.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(12.07 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.97 MB)
    speakers.pth(495 bytes)
  • v2.0.0(Jul 10, 2022)

    This is a release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 140 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta and Olena dataset.

    Example:

    Test sentence:

    К+Π°ΠΌ'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΈΠΉ - ΠΌ+істо Π² Π₯мСльн+ΠΈΡ†ΡŒΠΊΡ–ΠΉ +області Π£ΠΊΡ€Π°+Ρ—Π½ΠΈ, Ρ†+Π΅Π½Ρ‚Ρ€ Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΡ— ΠΌΡ–ΡΡŒΠΊ+ΠΎΡ— ΠΎΠ±'+Ρ”Π΄Π½Π°Π½ΠΎΡ— Ρ‚Π΅Ρ€ΠΈΡ‚ΠΎΡ€Ρ–+Π°Π»ΡŒΠ½ΠΎΡ— Π³Ρ€ΠΎΠΌ+Π°Π΄ΠΈ +Ρ– Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΠ³ΠΎ Ρ€Π°ΠΉ+ΠΎΠ½Ρƒ.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/178158485-29a5d496-7eeb-4938-8ea7-c345bc9fed57.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/178158492-8504080e-2f13-43f1-83f0-489b1f9cd66b.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(9.97 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.72 MB)
    optimized.pth(329.95 MB)
    speakers.pth(431 bytes)
  • v2.0.0-beta(May 8, 2022)

    This is a beta release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 150 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta dataset.

    Example:

    https://user-images.githubusercontent.com/5759207/167305810-2b023da7-0657-44ac-961f-5abf1aa6ea7d.mp4

    :

    Source code(tar.gz)
    Source code(zip)
    config.json(8.85 KB)
    LICENSE(34.32 KB)
    model-inference.pth(317.15 MB)
    model.pth(951.32 MB)
    tts_output.wav(1.11 MB)
  • v1.0.0(Jan 14, 2022)

  • v0.0.1(Oct 14, 2021)

πŸ’› Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 50 Dec 21, 2022
The swas programming language

The Swas programming language This is a language that was made for fun. Installation Step 0: Make sure you have python installed Step 1. Clone this re

Swas.py 19 Jul 18, 2022
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Dense Passage Retrieval Dense Passage Retrieval (DPR) - is a set of tools and models for state-of-the-art open-domain Q&A research. It is based on the

Meta Research 1.1k Jan 07, 2023
Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Ankur Dhuriya 10 Oct 13, 2022
Sample data associated with the Aurora-BP study

The Aurora-BP Study and Dataset This repository contains sample code, sample data, and explanatory information for working with the Aurora-BP dataset

Microsoft 16 Dec 12, 2022
LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation Tasks | Datasets | LongLM | Baselines | Paper Introduction LOT is a ben

46 Dec 28, 2022
Guide to using pre-trained large language models of source code

Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe

Vincent Hellendoorn 947 Dec 28, 2022
Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Sploitus Command line search tool for sploitus.com. Think searchsploit, but with

watchdog2000 5 Mar 07, 2022
AMUSE - financial summarization

AMUSE AMUSE - financial summarization Unzip data.zip Train new model: python FinAnalyze.py --task train --start 0 --count how many files,-1 for all

1 Jan 11, 2022
A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

A2T: Towards Improving Adversarial Training of NLP Models This is the source code for the EMNLP 2021 (Findings) paper "Towards Improving Adversarial T

QData 17 Oct 15, 2022
Yet Another Sequence Encoder - Encode sequences to vector of vector in python !

Yase Yet Another Sequence Encoder - encode sequences to vector of vectors in python ! Why Yase ? Yase enable you to encode any sequence which can be r

Pierre PACI 12 Aug 19, 2021
A tool helps build a talk preview image by combining the given background image and talk event description

talk-preview-img-builder A tool helps build a talk preview image by combining the given background image and talk event description Installation and U

PyCon Taiwan 4 Aug 20, 2022
Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

Proquabet Turn your prose into a constant stream of encrypted and meaningless-so

Milo Fultz 2 Oct 10, 2022
Conditional Transformer Language Model for Controllable Generation

CTRL - A Conditional Transformer Language Model for Controllable Generation Authors: Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong,

Salesforce 1.7k Dec 28, 2022
Kerberoast with ACL abuse capabilities

targetedKerberoast targetedKerberoast is a Python script that can, like many others (e.g. GetUserSPNs.py), print "kerberoast" hashes for user accounts

Shutdown 213 Dec 22, 2022
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 06, 2023
xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Description xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building bl

Facebook Research 2.3k Jan 08, 2023
SGMC: Spectral Graph Matrix Completion

SGMC: Spectral Graph Matrix Completion Code for AAAI21 paper "Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning". Data Format

Chao Chen 8 Dec 12, 2022
Creating an LSTM model to generate music

Music-Generation Creating an LSTM model to generate music music-generator Used to create basic sin wave sounds music-ai Contains the functions to conv

Jerin Joseph 2 Dec 02, 2021
hashily is a Python module that provides a variety of text decoding and encoding operations.

hashily is a python module that performs a variety of text decoding and encoding functions. It also various functions for encrypting and decrypting text using various ciphers.

DevMysT 5 Jul 17, 2022