Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Overview

logo

Pypi version Python3 version MIT License total stats download stats / month discord


Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya-speech.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya-speech

GPU version

$ pip install malaya-speech[gpu]

Only Python 3.6.0 and above and Tensorflow 1.15.0 and above are supported.

We recommend to use virtualenv for development. All examples tested on Tensorflow version 1.15.4, 1.15.5, 2.4.1 and 2.5.

Features

  • Age Detection, detect age in speech using Finetuned Speaker Vector.
  • Speaker Diarization, diarizing speakers using Pretrained Speaker Vector.
  • Emotion Detection, detect emotions in speech using Finetuned Speaker Vector.
  • Force Alignment, generate a time-aligned transcription of an audio file using RNNT.
  • Gender Detection, detect genders in speech using Finetuned Speaker Vector.
  • Language Detection, detect hyperlocal languages in speech using Finetuned Speaker Vector.
  • Multispeaker Separation, Multispeaker separation using FastSep on 8k Wav.
  • Noise Reduction, reduce multilevel noises using STFT UNET.
  • Speaker Change, detect changing speakers using Finetuned Speaker Vector.
  • Speaker overlap, detect overlap speakers using Finetuned Speaker Vector.
  • Speaker Vector, calculate similarity between speakers using Pretrained Speaker Vector.
  • Speech Enhancement, enhance voice activities using Waveform UNET.
  • SpeechSplit Conversion, detailed speaking style conversion by disentangling speech into content, timbre, rhythm and pitch using PyWorld and PySPTK.
  • Speech-to-Text, End-to-End Speech to Text for Malay, Mixed (Malay, Singlish and Mandarin) and Singlish using RNNT and Wav2Vec2 CTC.
  • Super Resolution, Super Resolution 4x for Waveform.
  • Text-to-Speech, Text to Speech for Malay and Singlish using Tacotron2, FastSpeech2 and FastPitch.
  • Vocoder, convert Mel to Waveform using MelGAN, Multiband MelGAN and Universal MelGAN Vocoder.
  • Voice Activity Detection, detect voice activities using Finetuned Speaker Vector.
  • Voice Conversion, Many-to-One, One-to-Many, Many-to-Many, and Zero-shot Voice Conversion.
  • Hybrid 8-bit Quantization, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models

Malaya-Speech also released pretrained models, simply check at malaya-speech/pretrained-model

References

If you use our software for research, please cite:

@misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya-Speech},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya-speech}}
}

Acknowledgement

Thanks to KeyReply for sponsoring private cloud to train Malaya-Speech models, without it, this library will collapse entirely.

logo
You might also like...
ExKaldi-RT: An Online Speech Recognition Extension Toolkit of Kaldi

ExKaldi-RT is an online ASR toolkit for Python language. It reads realtime streaming audio and do online feature extraction, probability computation, and online decoding.

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models
IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。
text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。

ttskit Text To Speech Toolkit: 语音合成工具箱。 安装 pip install -U ttskit 注意 可能需另外安装的依赖包:torch,版本要求torch=1.6.0,=1.7.1,根据自己的实际环境安装合适cuda或cpu版本的torch。 ttskit的

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.
PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Tevatron Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models. The toolkit has a modularized

Releases(1.3.0)
  • 1.3.0(Sep 18, 2022)

    1. Added GPT2 LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/gpt2-lm.html
    2. Added Mask LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/masked-lm.html
    3. Added Transducer with GPT2 LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html
    4. Added Transducer with Mask LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html
    5. Added GPT2 LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-gpt2.html
    6. Added Mask LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-mlm.html
    7. Added Squeezeformer transducer models.
    8. Added End-to-End FastSpeech2 STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-e2e-fastspeech2.html
    9. Added End-to-End VITS STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-vits.html
    10. Added Neural Vocoder Super Resolution models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-tfgan.html
    11. Added super resolution diffusion models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-audio-diffusion.html
    12. Added HMM speaker diarization, https://malaya-speech.readthedocs.io/en/latest/load-diarization-clustering-hmm.html
    Source code(tar.gz)
    Source code(zip)
  • 1.2.7(Jun 13, 2022)

    1. Added Speech-to-Text HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html
    2. Added Force Alignment HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html
    3. Added Text-to-Speech LightSpeech, https://arxiv.org/abs/2102.04040, https://malaya-speech.readthedocs.io/en/latest/tts-lightspeech-model.html
    4. Now Transducer LM support multi-languages.
    Source code(tar.gz)
    Source code(zip)
  • 1.2.6(May 6, 2022)

    1. Use HuggingFace as backend repository.
    2. Added yasmin and osman speakers for TTS Tacotron2, https://malaya-speech.readthedocs.io/en/latest/tts-tacotron2-model.html
    3. Added yasmin and osman speakers for TTS FastSpeech2, https://malaya-speech.readthedocs.io/en/latest/tts-fastspeech2-model.html
    4. Added yasmin and osman speakers for TTS GlowTTS, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html
    5. Use yasmin and osman speakers for long text TTS, https://malaya-speech.readthedocs.io/en/latest/tts-long-text.html
    Source code(tar.gz)
    Source code(zip)
  • 1.2.5(Mar 20, 2022)

  • 1.2.4(Mar 1, 2022)

    1. Added malay language pretrained BEST-RQ models, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/stt/best_rq
    2. Added BEST-RQ STT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html#List-available-CTC-model
    Source code(tar.gz)
    Source code(zip)
  • 1.2.2(Dec 29, 2021)

  • 1.2.1(Dec 2, 2021)

    1. Added more KenLM models, included Malay + Singlish, https://malaya-speech.readthedocs.io/en/latest/ctc-language-model.html
    2. Improved ASR CTC models, Hubert-Conformer-Large achieved 12.8% WER-LM, 3.8% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html
    3. Added CTC Decoders interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-ctc-decoders.html
    4. Added pyctcdecode interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode.html
    5. Improved ASR RNNT models, large-conformer achieved 14.8% WER-LM, 5.9% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model.html
    6. Added KenLM support for ASR RNNT models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html
    7. Added ASR RNNT for 2 mixed languages, Malay and Singlish, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html#
    8. Added ASR RNNT for 3 mixed languages, Malay, Singlish and Mandarin, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-3mixed.html
    9. Added GlowTTS Text-to-Speech, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html
    10. Added GlowTTS Text-to-Speech Multispeakers, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-multispeaker-model.html
    11. Added HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-vocoder.html
    12. Added Universal HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-universal-hifigan.html
    Source code(tar.gz)
    Source code(zip)
  • 1.2(Oct 2, 2021)

    1. Added HuBERT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html, new SOTA on Malay CER.
    2. Improved Singlish TTS model, now supported Universal MelGAN as vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-singlish.html
    3. Added Force Alignment module, now you can generate a time-aligned for your transcription, https://malaya-speech.readthedocs.io/en/latest/force-alignment.html
    4. Improved Mixed STT Transducer models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html
    5. Add new Mixed STT SOTA models, called conformer-stack-mixed, 2% better than other Mixed STT models, no paper produced, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html#List-available-RNNT-model
    6. Add Singlish STT Transducer models, thanks to Singapore National Speech Corpus for the dataset, https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-singlish.html
    Source code(tar.gz)
    Source code(zip)
  • 1.1.1(Jun 29, 2021)

    1. Improved Bahasa Speech-to-Text, Large Conformer beat Google Speech-to-Text accuracy.
    2. Improved Mixed (malay and singlish) Speech-to-Text.
    3. Added real time Mixed (malay and singlish) Speech-to-Text documentation, https://malaya-speech.readthedocs.io/en/latest/realtime-asr-mixed.html
    Source code(tar.gz)
    Source code(zip)
  • 1.1(Jun 1, 2021)

  • 1.0(Apr 18, 2021)

Owner
HUSEIN ZOLKEPLI
I really love to fart and korek hidung.
HUSEIN ZOLKEPLI
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

4 Nov 11, 2021
100+ Chinese Word Vectors 上百种预训练中文词向量

Chinese Word Vectors 中文词向量 中文 This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse),

embedding 10.4k Jan 09, 2023
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

Jinglin Liu 829 Jan 07, 2023
Machine Psychology: Python Generated Art

Machine Psychology: Python Generated Art A limited collection of 64 algorithmically generated artwork. Each unique piece is then given a title by the

Pixegami Team 67 Dec 13, 2022
EMNLP 2021 paper "Pre-train or Annotate? Domain Adaptation with a Constrained Budget".

Pre-train or Annotate? Domain Adaptation with a Constrained Budget This repo contains code and data associated with EMNLP 2021 paper "Pre-train or Ann

Fan Bai 8 Dec 17, 2021
Continuously update some NLP practice based on different tasks.

NLP_practice We will continuously update some NLP practice based on different tasks. prerequisites Software pytorch = 1.10 torchtext = 0.11.0 sklear

0 Jan 05, 2022
✨Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.

✨A Python framework to explore, label, and monitor data for NLP projects

Recognai 1.5k Jan 02, 2023
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

BERT-of-Theseus Code for paper "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing". BERT-of-Theseus is a new compressed BERT by progre

Kevin Canwen Xu 284 Nov 25, 2022
DLO8012: Natural Language Processing & CSL804: Computational Lab - II

NATURAL-LANGUAGE-PROCESSING-AND-COMPUTATIONAL-LAB-II DLO8012: NLP & CSL804: CL-II [SEMESTER VIII] Syllabus NLP - Reference Books THE WALL MEGA SATISH

AMEY THAKUR 7 Apr 28, 2022
A number of methods in order to perform Natural Language Processing on live data derived from Twitter

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

1 Nov 24, 2021
Translate U is capable of translating the text present in an image from one language to the other.

Translate U is capable of translating the text present in an image from one language to the other. The app uses OCR and Google translate to identify and translate across 80+ languages.

Neelanjan Manna 1 Dec 22, 2021
Unlimited Call - Text Bombing Tool

FastBomber Unlimited Call - Text Bombing Tool Installation On Termux

Aryan 6 Nov 10, 2022
A python package to fine-tune transformer-based models for named entity recognition (NER).

nerblackbox A python package to fine-tune transformer-based language models for named entity recognition (NER). Resources Source Code: https://github.

Felix Stollenwerk 13 Jul 30, 2022
Linear programming solver for paper-reviewer matching and mind-matching

Paper-Reviewer Matcher A python package for paper-reviewer matching algorithm based on topic modeling and linear programming. The algorithm is impleme

Titipat Achakulvisut 66 Jul 05, 2022
Indonesia spellchecker with python

indonesia-spellchecker Ganti kata yang terdapat pada file teks.txt untuk diperiksa kebenaran kata. Run on local machine python3 main.py

Rahmat Agung Julians 1 Sep 14, 2022
Input english text, then translate it between languages n times using the Deep Translator Python Library.

mass-translator About Input english text, then translate it between languages n times using the Deep Translator Python Library. How to Use Install dep

2 Mar 04, 2022
Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

KR-BERT-SimCSE Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT. Training Unsupervised python train_unsupervised.py --mi

Jeong Ukjae 27 Dec 12, 2022
The Easy-to-use Dialogue Response Selection Toolkit for Researchers

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

GMFTBY 32 Nov 13, 2022
PyWorld3 is a Python implementation of the World3 model

The World3 model revisited in Python Install & Hello World3 How to tune your own simulation Licence How to cite PyWorld3 with Bibtex References & ackn

Charles Vanwynsberghe 248 Dec 14, 2022
This repository contains the code for "Generating Datasets with Pretrained Language Models".

Datasets from Instructions (DINO 🦕 ) This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces

Timo Schick 154 Jan 01, 2023