glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Last update: Dec 25, 2022

Related tags

Overview

Glow-Speak

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Installation

git clone https://github.com/rhasspy/glow-speak.git
cd glow-speak/

python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade setuptools wheel
pip3 install -f 'https://synesthesiam.github.io/prebuilt-apps/' -r requirements.txt

python3 setup.py develop
glow-speak --version

Voices

The following languages/voices are supported:

German
- de_thorsten
Chinese
- cmn_jing_li
Greek
- el_rapunzelina
English
- en-us_ljspeech
- en-us_mary_ann
Spanish
- es_tux
Finnish
- fi_harri_tapani_ylilammi
French
- fr_siwis
Hungarian
- hu_diana_majlinger
Italian
- it_riccardo_fasol
Korean
- ko_kss
Dutch
- nl_rdh
Russian
- ru_nikolaev
Swedish
- sv_talesyntese
Swahili
- sw_biblia_takatifu
Vietnamese
- vi_vais1000

Usage

Download Voices

glow-speak-download de_thorsten

Command-Line Synthesis

glow-speak -v en-us_mary_ann 'This is a test.' --output-file test.wav

HTTP Server

glow-speak-http-server --debug

Visit http://localhost:5002

Socket Server

Start the server:

glow-speak-socket-server --voice en-us_mary_ann --socket /tmp/glow-speak.sock

From a separate terminal:

echo 'This is a test.' | bin/glow-speak-socket-client --socket /tmp/glow-speak.sock | xargs aplay

Lines from client to server are synthesized, and the path to the WAV file is returned (usually in /tmp).

You might also like...

End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

5.9k Jan 3, 2023

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

26 Dec 14, 2022

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

86 Jun 11, 2021

Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing. To make speech processing available to everyone, we're also releasing example implementation and recipe on some opensource dataset for various tasks (Automatic Speech Recognition, Speech Synthesis, Voice Conversion, Speaker Recognition, etc).

34 Sep 8, 2022

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

🤗 Contributing to OpenSpeech 🤗 OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

513 Jan 3, 2023

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation In this repo you can find the code of the Supervised Hybrid Audio Segmentatio

21 Dec 20, 2022

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper：An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

3 Apr 2, 2022

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

612 Jan 4, 2023

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

19 Oct 14, 2022

Comments

AssertionError on web interface (only) - and Raspberry Pi Bullseye test

Hi Micheal,

great work again! :smiley:

I just saw this repository and thought I'd give it a try on my freshly installed Raspberry Pi 4 with 32bit Raspberry Pi OS Bullseye (Debian 11). Installation almost finished without errors! :partying_face: ... I just had to fix one thing: sudo apt-get install libatlas-base-dev After 15min I was already generating audio :grin: :+1:

When I tested en mary_ann and thorsten_de via the web interface I got this error as soon as my test sentence ended with a question mark:

DEBUG:glow-speak:ɪ_z ð_ɪ_s ɐ_n_ˈʌ_ð_ɚ t_ˈɛ_s_t? .
ERROR:glow_speak.http_server:
Traceback (most recent call last):
  File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
    return await self.ensure_async(handler)(**request_.view_args)
  File "/home/pi/glow-speak/glow_speak/http_server.py", line 484, in app_say
    wav_bytes = await text_to_wav(text, voice, **tts_args)
  File "/home/pi/glow-speak/glow_speak/http_server.py", line 323, in text_to_wav
    text_ids = text_to_ids(
  File "/home/pi/glow-speak/glow_speak/__init__.py", line 110, in text_to_ids
    text_ids = phonemes2ids(
  File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/phonemes2ids/__init__.py", line 190, in phonemes2ids
    maybe_extend_ids(sub_phoneme, word_ids, append_list=False)
  File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/phonemes2ids/__init__.py", line 108, in maybe_extend_ids
    maybe_ids = missing_func(phoneme)
  File "/home/pi/glow-speak/glow_speak/__init__.py", line 59, in guess_ids
    typing.List[Phoneme], guess_phonemes(phoneme, self.to_phonemes)
  File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/gruut_ipa/accent.py", line 159, in guess_phonemes
    assert dist_split is not None
AssertionError

Maybe some encoding error when reading the web input?

Speed seems pretty good, comparable to Larynx I'd say :+1: and I noticed the pronunciations have been improved for German :clap: :sunglasses:

opened by fquirin 0

Releases(v1.0)

v1.0(Oct 20, 2021)

Source code(tar.gz)
Source code(zip)
cmn_jing_li.tar.gz(101.49 MB)
de_thorsten.tar.gz(101.59 MB)
el_rapunzelina.tar.gz(101.34 MB)
en-us_ljspeech.tar.gz(101.66 MB)
en-us_mary_ann.tar.gz(101.69 MB)
es_tux.tar.gz(101.61 MB)
fi_harri_tapani_ylilammi.tar.gz(101.46 MB)
fr_siwis.tar.gz(101.59 MB)
hu_diana_majlinger.tar.gz(101.47 MB)
it_riccardo_fasol.tar.gz(101.70 MB)
ko_kss.tar.gz(101.58 MB)
nl_rdh.tar.gz(101.60 MB)
ru_nikolaev.tar.gz(101.64 MB)
sv_talesyntese.tar.gz(101.42 MB)
sw_biblia_takatifu.tar.gz(101.71 MB)
vi_vais1000.tar.gz(101.28 MB)

Owner

Rhasspy

Offline voice assistant

GitHub Repository

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning GrammarTagger is an open-source toolkit for grammatical profiling for lan

27 Jan 05, 2023

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

TextCortex - HemingwAI Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingw

27 Nov 28, 2022

Automatically search Stack Overflow for the command you want to run

stackshell Automatically search Stack Overflow (and other Stack Exchange sites) for the command you want to ru Use the up and down arrows to change be

22 Oct 27, 2021

NVDA, the free and open source Screen Reader for Microsoft Windows

NVDA NVDA (NonVisual Desktop Access) is a free, open source screen reader for Microsoft Windows. It is developed by NV Access in collaboration with a

1.6k Jan 07, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 03, 2023

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT2, Transform

2.5k Jan 03, 2023

Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

665 Dec 17, 2022

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating (Dataset) The dataset is from Amazon Review Data (2018)

1 Jan 16, 2022

A flask application to predict the speech emotion of any .wav file.

This is a speech emotion recognition app. It will allow you to train a modular MLP model with the RAVDESS dataset, and then use that model with a flask application to predict the speech emotion of an

2 Dec 15, 2021

Exploration of BERT-based models on twitter sentiment classifications

twitter-sentiment-analysis Explore the relationship between twitter sentiment of Tesla and its stock price/return. Explore the effect of different BER

2 Oct 02, 2022

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

LightSpeech UnOfficial PyTorch implementation of LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search.

54 Dec 03, 2022

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Language Models are Few-shot Multilingual Learners Paper This is the source code of the paper [Arxiv] [ACL Anthology]: This code has been written usin

45 Nov 21, 2022

String Gen + Word Checker

Creates random strings and checks if any of them are a real words. Mostly a waste of time ngl but it is cool to see it work and the fact that it can generate a real random word within10sec

1 Jan 06, 2022

Deep Learning Topics with Computer Vision & NLP

Deep learning Udacity Course Deep Learning Topics with Computer Vision & NLP for the AWS Machine Learning Engineer Nanodegree Program Tasks are mostly

1 Jan 20, 2022

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An

36 Dec 18, 2022

A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

Rule-Based-Classification-in-a-Banking-Case. A CRM department in a local bank works on classify their lost customers with their past datas. So they wa

4 Mar 20, 2022

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

767 Jan 09, 2023

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Related tags

Overview

Glow-Speak

Installation

Voices

Usage

Download Voices

Command-Line Synthesis

HTTP Server

Socket Server

You might also like...

End-to-End Speech Processing Toolkit

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Athena is an open-source implementation of end-to-end speech processing engine.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Comments

AssertionError on web interface (only) - and Raspberry Pi Bullseye test

Releases(v1.0)

v1.0(Oct 20, 2021)

Owner

Rhasspy

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

Automatically search Stack Overflow for the command you want to run

NVDA, the free and open source Screen Reader for Microsoft Windows

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

Scikit-learn style model finetuning for NLP

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

A flask application to predict the speech emotion of any .wav file.

Exploration of BERT-based models on twitter sentiment classifications

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

String Gen + Word Checker

Deep Learning Topics with Computer Vision & NLP

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

A Python script which randomly chooses and prints a file from a directory.

hashily is a Python module that provides a variety of text decoding and encoding operations.

A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

A cross platform OCR Library based on PaddleOCR & OnnxRuntime