A Python/Pytorch app for easily synthesising human voices

Last update: Jan 04, 2023

Overview

Voice Cloning App

A Python/Pytorch app for easily synthesising human voices

Documentation

Discord Server

Video guide

Voice Sharing Hub

FAQ's

System Requirements

Windows 10 or Ubuntu 20.04+ operating system
5GB+ Disk space
NVIDIA GPU with at least 4GB of memory & driver version 456.38+ (optional)

Key features

Automatic dataset generation (with support for subtitles and audiobooks)
Additional language support
Local & remote training
Easy train start/stop
Data importing/exporting
Multi GPU support

Manual Guides

Future Improvements

Add support for Talknet
Add GTA alignment for Hifi-gan
Improved batch size estimation
AMD GPU support

Other resources

Try out existing voices at uberduck.ai and Vocodes
Youtube data fetching (created by Diskr33t#5880)
Synthesize in Colab (created by mega b#6696)
Generate youtube transcription (created by mega b#6696)
Wit.ai transcription

Acknowledgements

This project uses a reworked version of Tacotron2. All rights for belong to NVIDIA and follow the requirements of their BSD-3 licence.

Additionally, the project uses DSAlign, Silero, DeepSpeech & hifi-gan.

Thank you to Dr. John Bustard at Queen's University Belfast for his support throughout the project.

Supported by uberduck.ai, reach out to them for live model hosting.

Also a big thanks to the members of the VocalSynthesis subreddit for their feedback.

Finally thank you to everyone raising issues and contributing to the project.

Comments

Transcription error: wav file is empty
Hello

I am running the Voice-Cloning-App.exe on Windows 10. I have a GeForce RTX 2060 Graphics Card with the GeForce Game Ready Driver Version 461.92.

When I attempt build the data set, the windows console stops after the following:

[12644] WARNING: file already exists but should not: C:\Users\GREGOR~1\AppData\Local\Temp_MEI126442\torch_C.cp38-win_amd64.pyd Server initialized for threading. Server initialized for threading. pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available. torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail. INFO:matplotlib.font_manager:Generating new fontManager, this may take some time... [nltk_data] Downloading package wordnet to C:\Users\GREGOR~1\AppData\L [nltk_data] ocal\Temp_MEI126442\nltk_data... [nltk_data] Package wordnet is already up-to-date! WARNING:werkzeug:WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.

Serving Flask app "main" (lazy loading)

Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

Debug mode: off INFO:werkzeug: * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:25] "GET / HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:57] "POST / HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:57] "GET /static/error.css HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:57] "GET /favicon.ico HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:11] "GET / HTTP/1.1" 200 - Starting Thread INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "POST / HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet OPEN data {'sid': 'qINJoZN0iSsAW66FAAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000} INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet OPEN data {'sid': 'qINJoZN0iSsAW66FAAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000} INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "GET /socket.io/?EIO=4&transport=polling&t=NXjKmkr HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Received packet MESSAGE data 0/voice, INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet MESSAGE data 0/voice, qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 0/voice,{"sid":"hvDlhnRAa1GAVtomAAAB"} INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 0/voice,{"sid":"hvDlhnRAa1GAVtomAAAB"} INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "POST /socket.io/?EIO=4&transport=polling&t=NXjKmlA&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "GET /socket.io/?EIO=4&transport=polling&t=NXjKmlB&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading audio from data\datasets\JamesEarlJones\audio.mp3..."}] INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:47] "GET /socket.io/?EIO=4&transport=polling&t=NXjKmlb&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading audio from data\datasets\JamesEarlJones\audio.mp3..."}] INFO:voice:Loading audio from data\datasets\JamesEarlJones\audio.mp3... emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\datasets\JamesEarlJones\text.txt..."}] INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:02] "GET /socket.io/?EIO=4&transport=polling&t=NXjKnxd&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\datasets\JamesEarlJones\text.txt..."}] INFO:voice:Loading script from data\datasets\JamesEarlJones\text.txt... emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}] INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}] INFO:voice:Fetching segments... INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:02] "GET /socket.io/?EIO=4&transport=polling&t=NXjKrgH&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:07] "GET /socket.io/?EIO=4&transport=polling&t=NXjKrgS&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:07] "POST /socket.io/?EIO=4&transport=polling&t=NXjKsst&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Transcribing segments..."}] INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:20] "GET /socket.io/?EIO=4&transport=polling&t=NXjKssu&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Transcribing segments..."}] INFO:voice:Transcribing segments... Using cache found in C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail. Exception in thread Thread-13: Traceback (most recent call last): File "application\utils.py", line 47, in background_task max_seqlength = max(max([len(_) for _ in batch]), 12800) File "application\utils.py", line 32, in create_dataset if wav.size(0) > 1: File "dataset\forced_alignment\align.py", line 123, in align File "dataset\transcribe.py", line 34, in stt File "dataset\transcribe.py", line 16, in transcribe File "torch\hub.py", line 370, in load File "torch\hub.py", line 399, in _load_local File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\hubconf.py", line 24, in silero_stt model, decoder = init_jit_model(model_url=models.stt_models.get(language).latest.jit, File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\utils.py", line 135, in init_jit_model model = torch.jit.load(model_path, map_location=device) File "torch\jit_serialization.py", line 161, in load RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "threading.py", line 932, in bootstrap_inner File "threading.py", line 870, in run File "application\utils.py", line 50, in background_task inputs[i, :len(wav)].copy(wav) NameError: name 'traceback' is not defined qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:32] "GET /socket.io/?EIO=4&transport=polling&t=NXjKvy4&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:32] "POST /socket.io/?EIO=4&transport=polling&t=NXjKyzw&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:57] "GET /socket.io/?EIO=4&transport=polling&t=NXjKyzw.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:57] "POST /socket.io/?EIO=4&transport=polling&t=NXjL358&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:22] "GET /socket.io/?EIO=4&transport=polling&t=NXjL358.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:22] "POST /socket.io/?EIO=4&transport=polling&t=NXjL9CA&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:47] "GET /socket.io/?EIO=4&transport=polling&t=NXjL9CB&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:47] "POST /socket.io/?EIO=4&transport=polling&t=NXjLFJ8&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:12] "GET /socket.io/?EIO=4&transport=polling&t=NXjLFJ8.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:12] "POST /socket.io/?EIO=4&transport=polling&t=NXjLLQ8&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:37] "GET /socket.io/?EIO=4&transport=polling&t=NXjLLQ9&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:37] "POST /socket.io/?EIO=4&transport=polling&t=NXjLRWz&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:02] "GET /socket.io/?EIO=4&transport=polling&t=NXjLRW-&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:02] "POST /socket.io/?EIO=4&transport=polling&t=NXjLXe3&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:27] "GET /socket.io/?EIO=4&transport=polling&t=NXjLXe4&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:27] "POST /socket.io/?EIO=4&transport=polling&t=NXjLdkv&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:52] "GET /socket.io/?EIO=4&transport=polling&t=NXjLdkv.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:52] "POST /socket.io/?EIO=4&transport=polling&t=NXjLjrp&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:17] "GET /socket.io/?EIO=4&transport=polling&t=NXjLjrq&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:17] "POST /socket.io/?EIO=4&transport=polling&t=NXjLpyh&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:42] "GET /socket.io/?EIO=4&transport=polling&t=NXjLpyh.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:42] "POST /socket.io/?EIO=4&transport=polling&t=NXjLw3g&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:03:07] "qINJoZN0iSsAW66FAAAA: Received packet CLOSE data GET /socket.io/?EIO=4&transport=polling&t=NXjLw3g.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1qINJoZN0iSsAW66FAAAA: Client is gone, closing socket Error.txt
bug
opened by GregoryBetsey 58
GPU Memory exhausted (training)
Label for this issue is help wanted.

Hardware

Processor: Intel Core i9 - 9900K CPU @ 3.60GHz Installed RAM: 64.0 GB GPU: NVIDIA GeForce RTX 2070 - 8GB

Attempt

Failure

Tried Steps

Epoch selection:

8000

5000

3000

1000 and 10 iterations

Steps that might help to resolve

I found some articles that might be be helpful to resolve this:

https://pytorch.org/docs/stable/notes/faq.html

https://forums.fast.ai/t/clearing-gpu-memory-pytorch/14637/4

Questions

Would you suggest to reduce the iterator counter from 1000 to a smaller number? If yes then what number would be ideal.

Would you suggest to reduce the epochs from 8000 to a smaller number? If yes then what number would be ideal.

bug
opened by vinamramunot-tech 15
HTTPError

First time user - after uploading .txt and .wav I get the following error. Any guidance?

Type: HTTPError Text: HTTP Error 503: Service Temporarily Unavailable Full: Traceback (most recent call last): File "flask\app.py", line 1950, in full_dispatch_request File "flask\app.py", line 1936, in dispatch_request File "application\views.py", line 99, in create_dataset_post File "dataset\transcribe.py", line 85, in create_transcription_model File "dataset\transcribe.py", line 58, in init File "torch\hub.py", line 364, in load model = _load_local(repo_or_dir, model, *args, **kwargs) File "torch\hub.py", line 393, in _load_local model = entry(*args, **kwargs) File "C:\Users\Sam London/.cache\torch\hub\snakers4_silero-models_master\hubconf.py", line 28, in silero_stt **kwargs) File "C:\Users\Sam London/.cache\torch\hub\snakers4_silero-models_master\utils.py", line 128, in init_jit_model progress=True) File "torch\hub.py", line 419, in download_url_to_file u = urlopen(req) File "urllib\request.py", line 223, in urlopen File "urllib\request.py", line 532, in open File "urllib\request.py", line 642, in http_response File "urllib\request.py", line 570, in error File "urllib\request.py", line 504, in _call_chain File "urllib\request.py", line 650, in http_error_default urllib.error.HTTPError: HTTP Error 503: Service Temporarily Unavailable

opened by ManBearPig87 13
Training unexpectedly crashes

Okay I have my wavs folder and my metadata.csv, all is working good except that when I train it loads all data (I can see it on the "embedded console", and then after some seconds opens a new tab in my browser with the main page (localhost:5000). If I switch to the training tab and I wait for some time it doesn't make any progress. Can you help me with this please?
bug

opened by gbh4x 12
Error fetching silero model

Full: Traceback (most recent call last): File "C:\Users\superuser\Anaconda3\lib\site-packages\flask\app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "C:\Users\superuser\Anaconda3\lib\site-packages\flask\app.py", line 1936, in dispatch_request return self.view_functionsrule.endpoint File "C:\Users\superuser\Desktop\Voice-Cloning-App\application\views.py", line 249, in synthesis_post alertnative_words=get_alternative_word_suggestions(audio_path, text), File "C:\Users\superuser\Desktop\Voice-Cloning-App\synthesis\synonyms.py", line 29, in get_alternative_word_suggestions poor_words = evalulate_audio(audio, text) File "C:\Users\superuser\Desktop\Voice-Cloning-App\synthesis\synonyms.py", line 20, in evalulate_audio results = transcribe(audio) File "C:\Users\superuser\Desktop\Voice-Cloning-App\dataset\transcribe.py", line 27, in transcribe repo_or_dir="snakers4/silero-models", model="silero_stt", language="en", device=device File "C:\Users\superuser\Anaconda3\lib\site-packages\torch\hub.py", line 370, in load model = _load_local(repo_or_dir, model, *args, **kwargs) File "C:\Users\superuser\Anaconda3\lib\site-packages\torch\hub.py", line 399, in _load_local model = entry(*args, **kwargs) File "C:\Users\superuser/.cache\torch\hub\snakers4_silero-models_master\hubconf.py", line 25, in silero_stt **kwargs) File "C:\Users\superuser/.cache\torch\hub\snakers4_silero-models_master\utils.py", line 133, in init_jit_model progress=True) File "C:\Users\superuser\Anaconda3\lib\site-packages\torch\hub.py", line 445, in download_url_to_file unit='B', unit_scale=True, unit_divisor=1024) as pbar: File "C:\Users\superuser\Anaconda3\lib\site-packages\tqdm_tqdm.py", line 662, in init TqdmKeyError("Unknown argument(s): " + str(kwargs))) tqdm._tqdm.TqdmKeyError: "Unknown argument(s): {'unit_divisor': 1024}"
bug

opened by CrazyPlaysHD 9
Force Align text and Audio (dataset)
Hi @BenAAndrew I am on the step where I am trying to align the text and audio of an audiobook. I have acquired the audio and text from amazon audible. Unfortunately, I was not able to assign the help label to this issue. I don't think I have the permission for that.

[X] using virtualenv

In order to work through align.py, I had to modify it. After modifying I was able to run the file. Below is the modified part of the file. Also in the screenshot category I have mentioned how I am trying to execute this file.

import os import sys import json import logging import argparse from pydub import AudioSegment sys.path.append(".") from search import FuzzySearch from audio import DEFAULT_RATE, read_frames_from_file, vad_split from dataset.transcribe import stt

Screenshots

Failure Point

Questions

Do you suggest to use a virtualenv?

Do I need to reduce the quality of wav file or the mp3 file?

Link to the dataset

Audio Dataset

I have the book.txt and the mp3 file. I have converted that mp3 to wav file when I am trying to use the align. Please let me know if you can try using my dataset. Thanks for the help in advance.
bug question
opened by vinamramunot-tech 9
All unlabeled clips unplayable in the Manage datasets interface for a freshly built dataset.

This is with VCA release 1.0.2 using the compiled executable on Windows 10. The play option is simply grayed out for all listed clips. Have checked, and all the listed clips are playable from within file explorer, and in the case of this dataset, ranging in length from 1-6 seconds.
bug

opened by RayDAnt3D 8
Imported model cannot be loaded

Was exporting a model beforehand. (Using a google colab notebook.)

Type: ValueError Text: invalid literal for int() with base 10: 'checkpoints' Full: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1936, in dispatch_request return self.view_functionsrule.endpoint File "/content/Voice-Cloning-App/application/views.py", line 391, in download_model model_path = get_latest_checkpoint(os.path.join(paths["models"], model_name)) File "/content/Voice-Cloning-App/training/checkpoint.py", line 27, in get_latest_checkpoint if int(checkpoint.split("")[1].split(".")[0]) > int(latest_checkpoint.split("")[1].split(".")[0]): ValueError: invalid literal for int() with base 10: 'checkpoints'
bug

opened by ericstheguy 8

Hang-up when attempting to process dataset on Ampere card [dataset]

Description: Application freezes during dataset preparation process, I believe this is due to the bundled version of pytorch not supporting ampere cards.

Version used: 0.6.1 (Windows Executable)

OS: Win 10 21H1

Device Specificiations: r9 3900x, dual 3090s, 32gb system memory

Nvidia Driver version: 461.92

Log:

INFO:werkzeug: * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:24] "GET / HTTP/1.1" 200 -
Starting Thread
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:48] "POST / HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:48] "GET /static/application.js HTTP/1.1" 200 -
PQVUzoQCJzc-rrA_AAAA: Sending packet OPEN data {'sid': 'PQVUzoQCJzc-rrA_AAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000}
INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet OPEN data {'sid': 'PQVUzoQCJzc-rrA_AAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000}
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:48] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0WEU HTTP/1.1" 200 -
PQVUzoQCJzc-rrA_AAAA: Received packet MESSAGE data 0/voice,
INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Received packet MESSAGE data 0/voice,
PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 0/voice,{"sid":"iBJGih1vKc84-OLJAAAB"}
INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 0/voice,{"sid":"iBJGih1vKc84-OLJAAAB"}
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:48] "POST /socket.io/?EIO=4&transport=polling&t=NZQ0WII&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:48] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0WII.0&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
emitting event "logs" to all [/voice]
INFO:socketio.server:emitting event "logs" to all [/voice]
PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Coverting data\\datasets\\Diana\\audio.mp3..."}]
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:53] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0WN2&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Coverting data\\datasets\\Diana\\audio.mp3..."}]
INFO:voice:Coverting data\datasets\Diana\audio.mp3...
emitting event "logs" to all [/voice]
INFO:socketio.server:emitting event "logs" to all [/voice]
PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\\datasets\\Diana\\text.txt..."}]
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:56] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0XNM&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\\datasets\\Diana\\text.txt..."}]
INFO:voice:Loading script from data\datasets\Diana\text.txt...
emitting event "logs" to all [/voice]
INFO:socketio.server:emitting event "logs" to all [/voice]
PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Searching text for matching fragments..."}]
INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Searching text for matching fragments..."}]
INFO:voice:Searching text for matching fragments...
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:56] "emitting event "logs" to all [/voice]
GET /socket.io/?EIO=4&transport=polling&t=NZQ0YFA&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
INFO:socketio.server:emitting event "logs" to all [/voice]
PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Changing sample rate..."}]
INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Changing sample rate..."}]
INFO:voice:Changing sample rate...
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:56] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0YFR&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
emitting event "logs" to all [/voice]
INFO:socketio.server:emitting event "logs" to all [/voice]
PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}]
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:57] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0YKC&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}]
INFO:voice:Fetching segments...
emitting event "logs" to all [/voice]
INFO:socketio.server:emitting event "logs" to all [/voice]
PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Matching segments..."}]
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:57] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0YOL&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Matching segments..."}]
INFO:voice:Matching segments...
emitting event "logs" to all [/voice]
INFO:socketio.server:emitting event "logs" to all [/voice]
PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Generating segments..."}]
INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Generating segments..."}]
INFO:voice:Generating segments...
INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:57] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0YYk&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
Using cache found in C:\Users\thefi/.cache\torch\hub\snakers4_silero-models_master
torch\cuda\__init__.py:104: UserWarning:
GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

bug

opened by LexCybermac 8

cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

i m trying use my own audio and text, this is the error i m having.

cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

using windows version. python version is giving another error.

File "main.py", line 4, in from engineio.async_drivers import threading ModuleNotFoundError: No module named 'engineio'

I m new to both worlds, is there anything I m not doing?
bug

opened by Syed044 8
Invalid characters in text

I tried training with it and I got this: Invalid characters in text (for alphabet): ” (RIGHT DOUBLE QUOTATION MARK), (SOFT HYPHEN),’ (RIGHT SINGLE QUOTATION MARK),“ (LEFT DOUBLE QUOTATION MARK),— (EM DASH) and it refused to start. Does this mean the text shouldn't have punctuations anymore with this version? If I change to this version will I have to start all over?
question

opened by LeeroyJenkinsss 7
Get Audio file via Command

I use this command to get an audio file from cli but here the audio file Im getting is not saying anything at all, via GUI it works perfectly. Im using a more extended alphabet than english python synthesis/synthesize.py -m data/models/..... -vm "data/hifigan/vocoder/model.pt" -hc "data/hifigan/vocoder/config.json" -t "test this is a test" -a audio.wav Any idea? Thanks in advance

opened by miguelgh65 2
JSONDecodeError

When I go to synthesize and submit, this is the error message that appears: Text: Expecting value: line 1 column 1 (char 0) Full: Traceback (most recent call last): File "flask\app.py", line 1950, in full_dispatch_request File "flask\app.py", line 1936, in dispatch_request File "application\views.py", line 302, in synthesis_setup_post File "synthesis\vocoders\hifigan.py", line 25, in init File "json_init_.py", line 354, in loads File "json\decoder.py", line 339, in decode File "json\decoder.py", line 357, in raw_decode json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

opened by skydam 1

App is not starting.

I installed all requirements with python 3.6.15. When i try to start app i am getting this error

Traceback (most recent call last):
  File "main.py", line 52, in <module>
    from application.views import *  # noqa
  File "/home/libir/Desktop/voiceCloning/Voice-Cloning-App/application/views.py", line 11, in <module>
    from application.utils import (
  File "/home/libir/Desktop/voiceCloning/Voice-Cloning-App/application/utils.py", line 8, in <module>
    import librosa
  File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/librosa/__init__.py", line 12, in <module>
    from . import core
  File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/librosa/core/__init__.py", line 103, in <module>
    from .audio import *  # pylint: disable=wildcard-import
  File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/librosa/core/audio.py", line 12, in <module>
    import resampy
  File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/resampy/__init__.py", line 7, in <module>
    from .core import *
  File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/resampy/core.py", line 9, in <module>
    from .interpn import resample_f_s, resample_f_p
  File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/resampy/interpn.py", line 75, in <module>
    nopython=True,
TypeError: guvectorize() missing 1 required positional argument: 'signature'

Also i am using EndeavourOS.

opened by LibirSoft 3

Is it possible for me to extract a runtime/environment from this?

I can't find a python runtime/environment in the temp folder for this program. This is important as installing a correct environment to run tacotron2 is the biggest pain in the world, and I want to use the voice-cloning-app environment to run tacotron2 in another program, without being restricted to using the app. I could not find a single trace of python.exe, but i did find python related files, so how is it able to run? What is it doing to call the synthesis script?

opened by FlashlightET 1

Releases(v1.1.1)

v1.1.1(Feb 7, 2022)

Build with GPU support: https://mega.nz/file/4hpgFBAb#6GP3p0n-s5v9KIXmFRqxDst7BPaNhEy14JMlEe0aopY

Notes: Improve transcription error logging Update dataset info when manually labelling a clip Fix quotation problem in synthesis text
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-1.1.1-cpuonly.exe(450.38 MB)
v1.1.0(Dec 9, 2021)

Build with GPU support: https://mega.nz/file/0woADQSS#vud7UO7Pi-wNsa6eawqqdzcqNoCNGzH50XHDlbQnd0E

Notes: Add custom vocoder training Fix data path issue to ensure all files are in the data folder
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-1.1.0-cpuonly.exe(670.54 MB)
v1.0.4(Nov 29, 2021)

Build with GPU support: https://mega.nz/file/h8wQBDJY#j4jq3PVj4LDhMPeJJwwhod1bVuve9Sf3G6NyYzo-mZg

Notes: Fix unlabelled clip playback Add new remote training notebook Improve invalid symbols error message (thanks to @SirBitesalot) Fix symbols selection for non-English dataset creation (thanks to @SirBitesalot)
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-1.0.4-cpuonly.exe(670.51 MB)
v1.0.3(Nov 24, 2021)

Build with GPU support: https://mega.nz/file/Q5Ak2BZL#-AQTJyHO-wh0sGC5IcCX6Hjce6WQvhT7OXaueSZ2ztg

Notes: Fixed issue with invalid clips being added for manual labelling Fixed multi-line synthesis results folder naming (thanks @Marclass) Added support for custom languages to text cleaning to improve non-English dataset quality Improved dataset validation and errors
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-1.0.3-cpuonly.exe(670.50 MB)
v1.0.2(Oct 16, 2021)

Build with GPU support: https://mega.nz/file/FkgGASaL#q8hn70t6zn6m_a9UwwwI2RWbwz3BqzaJ6b7nfxucUbg

Notes: Remove pre-trained model buildup Add manage dataset option to view dataset info & label unlabelled clips
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-1.0.2-cpuonly.exe(670.50 MB)
v1.0.1(Oct 5, 2021)

Build with GPU support no longer available

Notes: Fix training from checkpoint issue Fix alignment GIF bug Update navigation bar (including links to docs & discord)
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-1.0.1-cpuonly.exe(670.50 MB)
v1.0.0(Sep 18, 2021)

Build with GPU support: https://mega.nz/file/xlgXlQSD#Ls1BN_CDDyoL5cqjzBhrh63RTQAa6fUv5DYnDY0Jowo

Notes: Generate an alignment graph in training to visualise how the model is performing against a test sentence Export timelapse of alignments to visualise improvement over time Add sentence length recommendation to synthesis
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-1.0.0-cpuonly.exe(670.52 MB)
v0.9.9(Sep 15, 2021)

Build with GPU support no longer available

Notes: Package FFmpeg & Silero with the executable to avoid additional downloads
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-0.9.9-cpuonly.exe(670.52 MB)
v0.9.8(Sep 8, 2021)

Build with GPU support no longer available

Notes: Improve symbol weights transfer to improve training from an existing model (thanks to @CookiePPP) Add attention scoring to training Fix dataset info generation when extending existing datasets Fix issue with checkpoint selection on the training page
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-0.9.8-cpuonly.exe(541.88 MB)
v0.9.7(Aug 31, 2021)

Build with GPU support no longer available

Notes: Add paragraph synthesis option with automatic sentence splitting Remove clip length requirement in dataset importing
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-0.9.7-cpuonly.exe(541.88 MB)
v0.9.6(Aug 29, 2021)

Build with GPU support no longer available

Notes: Add clip combiner and duration options to dataset building Fix blank line handling in synthesis Fix checkpoint exporting Standardise alignment JSON
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-0.9.6-cpuonly.exe(493.84 MB)
v0.9.5(Aug 18, 2021)

Build with GPU support no longer available

Notes: Remove waveglow Update to CUDA 11.1 to support 30 series GPU's
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-cpuonly-0.9.5.exe(493.83 MB)
v0.9.4(Aug 10, 2021)

Add checkpoint backup system Add subtitle support
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-0.9.4.exe(1453.25 MB)
Voice-Cloning-App-cpuonly-0.9.4.exe(298.64 MB)
v0.9.3(Jul 28, 2021)

Add train/test split slider to training New multi-line synthesis feature Add max_decoder_steps slider to synthesis
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-cpuonly.exe(298.11 MB)
Voice-Cloning-App.exe(1453.23 MB)
v0.9.2(Jul 27, 2021)

Remove synonym suggestion Move vocoder upload to settings Fix checkpoint selection in training
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-cpuonly.exe(298.10 MB)
Voice-Cloning-App.exe(1452.57 MB)
v0.9.1(Jul 25, 2021)

Fix English synthesis Enable checkpoint selection for export, training & synthesis
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-cpuonly.exe(319.70 MB)
Voice-Cloning-App.exe(1474.75 MB)
v0.9(Jul 24, 2021)

Added support for other languages
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-cpuonly.exe(319.70 MB)
Voice-Cloning-App.exe(1474.17 MB)
v0.8.4(Jul 23, 2021)

Update Kindle extraction guide
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-cpuonly.exe(303.95 MB)
Voice-Cloning-App.exe(1458.42 MB)
v0.8.3(Jul 17, 2021)

Add advanced training options for enabling/disabling checkpoint overwriting & multi GPU
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-cpuonly.exe(303.95 MB)
Voice-Cloning-App.exe(1458.42 MB)
v0.8.2(Jul 2, 2021)

Update Pytorch version and revert CUDA to 10.2
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-cpuonly.exe(303.62 MB)
Voice-Cloning-App.exe(1458.08 MB)
v0.8.1(Jun 20, 2021)

Add external Google Colab training
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App-cpuonly.exe(266.77 MB)
Voice-Cloning-App.exe(2037.41 MB)
v0.7.5(May 23, 2021)

Change checkpoint loading behavior to always load checkpoints (ignore transfer learning if not needed)
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App.exe(2037.41 MB)
v0.7.4(May 5, 2021)

Fix final checkpoint save
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App.exe(1306.49 MB)
v0.7.3(May 3, 2021)

Fix script importing for dataset combiner Enable more precise editing of batch size Fix file renaming in audio processing
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App.exe(1306.49 MB)
v0.7.2(May 2, 2021)

Increase max batch size Fix final checkpoint save Auto-remove app cache Improve dataset import validation Fix epoch calculation for changes in batch size Handle invalid characters in text files
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App.exe(1306.49 MB)
v0.7.1(Apr 25, 2021)

Updated quality estimate for transfer learning Added checkpoint frequency slider Fixed Hifi-gan vocoder for GPU
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App.exe(1306.49 MB)
v0.7(Apr 24, 2021)

Fixed timestamp creation causing clip generation to fail Added Hifi-gan vocoder to synthesis
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App.exe(1306.49 MB)
v0.6.3(Apr 23, 2021)

Improved dataset clip generation to produce more clips and of higher quality
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App.exe(1306.48 MB)
v0.6.2(Apr 23, 2021)

Add minimum confidence slider to dataset generation Check output of FFmpeg command
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App.exe(1306.48 MB)
v0.6.1(Apr 15, 2021)

Add early stopping option to training Fixed dataset exporting
Source code(tar.gz)
Source code(zip)
Voice-Cloning-App.exe(1382.42 MB)

Owner

Ben Andrew

Developer working on open source Machine Learning, IoT, CAD & electronics projects.

GitHub Repository

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

86 Dec 28, 2022

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

7.1k Jan 05, 2023

Input english text, then translate it between languages n times using the Deep Translator Python Library.

mass-translator About Input english text, then translate it between languages n times using the Deep Translator Python Library. How to Use Install dep

2 Mar 04, 2022

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classifi

186 Dec 24, 2022

CPC-big and k-means clustering for zero-resource speech processing

The CPC-big model and k-means checkpoints used in Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.

5 Nov 23, 2022

Words-per-minute - A terminal app written in python utilizing the curses module that tests the user's ability to type

words-per-minute A terminal app written in python utilizing the curses module th

1 Jan 14, 2022

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

3.2k Dec 31, 2022

ChatterBot is a machine learning, conversational dialog engine for creating chat bots

ChatterBot ChatterBot is a machine-learning based conversational dialog engine build in Python which makes it possible to generate responses based on

12.8k Jan 03, 2023

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

ReaLiSe ReaLiSe is a multi-modal Chinese spell checking model. This the office code for the paper Read, Listen, and See: Leveraging Multimodal Informa

106 Dec 29, 2022

🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

PAUSE: Positive and Annealed Unlabeled Sentence Embedding Sentence embedding refers to a set of effective and versatile techniques for converting raw

21 Dec 15, 2022

Python functions for summarizing and improving voice dictation input.

Helpmespeak Help me speak uses Python functions for summarizing and improving voice dictation input. Get started with OpenAI gpt-3 OpenAI is a amazing

6 Dec 17, 2022

Code for PED: DETR For (Crowd) Pedestrian Detection

36 Sep 13, 2022

Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

Diaformer Diaformer: Automatic Diagnosis via Symptoms Sequence Generation (AAAI 2022) Diaformer is an efficient model for automatic diagnosis via symp

20 Dec 13, 2022

Longformer: The Long-Document Transformer

Longformer Longformer and LongformerEncoderDecoder (LED) are pretrained transformer models for long documents. ***** New December 1st, 2020: Longforme

1.6k Dec 29, 2022

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

49 Dec 17, 2022

Language-Agnostic SEntence Representations

LASER Language-Agnostic SEntence Representations LASER is a library to calculate and use multilingual sentence embeddings. NEWS 2019/11/08 CCMatrix is

3.2k Jan 04, 2023

Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

1.1k Dec 27, 2022

DeLighT: Very Deep and Light-Weight Transformers

DeLighT: Very Deep and Light-weight Transformers This repository contains the source code of our work on building efficient sequence models: DeFINE (I

440 Dec 18, 2022

2021语言与智能技术竞赛：机器阅读理解任务

LICS2021 MRC 1. 项目&任务介绍本项目基于官方给定的baseline（DuReader-Checklist-BASELINE）进行二次改造，对整个代码框架做了简单的重构，对核心网络结构添加了注释，解耦了数据读取的模块，并添加了阈值确认的功能，一些小的细节也做了改进。本次任务为202

29 Dec 05, 2022

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors [Paper] [Project Website] Pytorch implementation for SAVI2I. We

44 Dec 30, 2022

A Python/Pytorch app for easily synthesising human voices

Related tags

Overview

Voice Cloning App

System Requirements

Key features

Manual Guides

Future Improvements

Other resources

Acknowledgements

Comments

Hardware

Attempt

Failure

Tried Steps

Steps that might help to resolve

Questions

Screenshots

Failure Point

Questions

Link to the dataset

Releases(v1.1.1)

v1.1.1(Feb 7, 2022)

v1.1.0(Dec 9, 2021)

v1.0.4(Nov 29, 2021)

v1.0.3(Nov 24, 2021)

v1.0.2(Oct 16, 2021)

v1.0.1(Oct 5, 2021)

v1.0.0(Sep 18, 2021)

v0.9.9(Sep 15, 2021)

v0.9.8(Sep 8, 2021)

v0.9.7(Aug 31, 2021)

v0.9.6(Aug 29, 2021)

v0.9.5(Aug 18, 2021)

v0.9.4(Aug 10, 2021)

v0.9.3(Jul 28, 2021)

v0.9.2(Jul 27, 2021)

v0.9.1(Jul 25, 2021)

v0.9(Jul 24, 2021)

v0.8.4(Jul 23, 2021)

v0.8.3(Jul 17, 2021)

v0.8.2(Jul 2, 2021)

v0.8.1(Jun 20, 2021)

v0.7.5(May 23, 2021)

v0.7.4(May 5, 2021)

v0.7.3(May 3, 2021)

v0.7.2(May 2, 2021)

v0.7.1(Apr 25, 2021)

v0.7(Apr 24, 2021)

v0.6.3(Apr 23, 2021)

v0.6.2(Apr 23, 2021)

v0.6.1(Apr 15, 2021)

Owner

Ben Andrew

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Input english text, then translate it between languages n times using the Deep Translator Python Library.

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

CPC-big and k-means clustering for zero-resource speech processing

Words-per-minute - A terminal app written in python utilizing the curses module that tests the user's ability to type

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

ChatterBot is a machine learning, conversational dialog engine for creating chat bots

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

Python functions for summarizing and improving voice dictation input.

Code for PED: DETR For (Crowd) Pedestrian Detection

Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

Longformer: The Long-Document Transformer

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Language-Agnostic SEntence Representations

Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

DeLighT: Very Deep and Light-Weight Transformers

2021语言与智能技术竞赛：机器阅读理解任务

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors