Python interface to the WebRTC Voice Activity Detector

Related tags

Audiopy-webrtcvad
Overview
https://travis-ci.org/wiseman/py-webrtcvad.svg?branch=master

py-webrtcvad

This is a python interface to the WebRTC Voice Activity Detector (VAD). It is compatible with Python 2 and Python 3.

A VAD classifies a piece of audio data as being voiced or unvoiced. It can be useful for telephony and speech recognition.

The VAD that Google developed for the WebRTC project is reportedly one of the best available, being fast, modern and free.

How to use it

  1. Install the webrtcvad module:

    pip install webrtcvad
    
  2. Create a Vad object:

    import webrtcvad
    vad = webrtcvad.Vad()
    
  3. Optionally, set its aggressiveness mode, which is an integer between 0 and 3. 0 is the least aggressive about filtering out non-speech, 3 is the most aggressive. (You can also set the mode when you create the VAD, e.g. vad = webrtcvad.Vad(3)):

    vad.set_mode(1)
    
  4. Give it a short segment ("frame") of audio. The WebRTC VAD only accepts 16-bit mono PCM audio, sampled at 8000, 16000, 32000 or 48000 Hz. A frame must be either 10, 20, or 30 ms in duration:

    # Run the VAD on 10 ms of silence. The result should be False.
    sample_rate = 16000
    frame_duration = 10  # ms
    frame = b'\x00\x00' * int(sample_rate * frame_duration / 1000)
    print 'Contains speech: %s' % (vad.is_speech(frame, sample_rate)
    

See example.py for a more detailed example that will process a .wav file, find the voiced segments, and write each one as a separate .wav.

How to run unit tests

To run unit tests:

pip install -e ".[dev]"
python setup.py test

History

2.0.10

Fixed memory leak. Thank you, bond005!

2.0.9

Improved example code. Added WebRTC license.

2.0.8

Fixed Windows compilation errors. Thank you, xiongyihui!
Owner
John Wiseman
John Wiseman
L-SpEx: Localized Target Speaker Extraction

L-SpEx: Localized Target Speaker Extraction The data configuration and simulation of L-SpEx. The code scripts will be released in the future. Data Gen

Meng Ge 20 Jan 02, 2023
music library manager and MusicBrainz tagger

beets Beets is the media library management system for obsessive music geeks. The purpose of beets is to get your music collection right once and for

beetbox 11.3k Dec 31, 2022
SomaFM Plugin for Kodi

SomaFM XBMC Plugin This description is a bit outdated. You can simply install this addon by browsing the official repositories from within Kodi. Insta

7 Jan 21, 2022
Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Meinard Mueller 66 Jan 02, 2023
python wrapper for rubberband

pyrubberband A python wrapper for rubberband. For now, this just provides lightweight wrappers for pitch-shifting and time-stretching. All processing

Brian McFee 106 Nov 28, 2022
SinGlow: Generative Flow for SVS tasks in Tensorflow 2

SinGlow is a part of my Singing voice synthesis system. It can extract features of sound, particularly songs and musics. Then we can use these features (or perfect encoding) for feature migrating tas

Haobo Yang 8 Aug 22, 2022
A python wrapper for REAPER

pyreaper A python wrapper for REAPER (Robust Epoch And Pitch EstimatoR) Installation pip install pyreaper Demonstration notebnook http://nbviewer.jupy

Ryuichi Yamamoto 56 Dec 27, 2022
An AI for Music Generation

An AI for Music Generation

Hao-Wen Dong 1.3k Dec 31, 2022
Deep learning transformer model that generates unique music sequences.

music-ai Deep learning transformer model that generates unique music sequences. Abstract In 2017, a new state-of-the-art was published for natural lan

xacer 6 Nov 19, 2022
A GUI-based audio player with support for a large variety of formats

Miza-Player A GUI-based audio player with support for a large variety of formats, able to play from web-hosted media platforms such as YouTube, includ

Thomas Xin 3 Dec 14, 2022
Music player and music library manager for Linux, Windows, and macOS

Ex Falso / Quod Libet - A Music Library / Editor / Player Quod Libet is a music management program. It provides several different ways to view your au

Quod Libet 1.2k Jan 07, 2023
Stevan KZ 1 Oct 27, 2021
Tradutor de um arquivo MIDI para ser usado em um simulador RISC-V(RARS)

Tradutor_MIDI-RISC-V Tradutor de um arquivo MIDI para ser usado em um simulador RISC-V(RARS) *O resultado sai com essa formatação: nota,duração,nota,d

Gabriel B. G. 4 Sep 02, 2022
This Is Telegram Music UserBot To Play Music Without Being Admin

This Is Telegram Music UserBot To Play Music Without Being Admin

Krishna Kumar 36 Sep 13, 2022
Manipulate audio with a simple and easy high level interface

Pydub Pydub lets you do stuff to audio in a way that isn't stupid. Stuff you might be looking for: Installing Pydub API Documentation Dependencies Pla

James Robert 6.6k Jan 01, 2023
This is a realtime voice translator program which gets input from user at any language and converts it to the desired language that the user asks

This is a realtime voice translator program which gets input from user at any language and converts it to the desired language that the user asks ...

Mohan Ram S 1 Dec 30, 2021
Desktop music recognition application for windows

MusicRecognizer Music recognition application for windows You can choose from which of the devices the recording will be made. If you choose speakers,

Nikita Merzlyakov 28 Dec 13, 2022
Open Sound Strip, Sequence or Record in Audacity

Audacity Tools For Blender Sound editing in Blender Video Sequence Editor with Audacity integrated. Send/receive the full edited sequence or single st

64 Dec 31, 2022
voice assistant made with python that search for covid19 data(like total cases, deaths and etc) in a specific country

covid19-voice-assistant voice assistant made with python that search for covid19 data(like total cases, deaths and etc) in a specific country installi

Miguel 2 Dec 05, 2021
Python module for handling audio metadata

Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg The

Quod Libet 1.1k Dec 31, 2022