WarpRNNT loss ported in Numba CPU/CUDA for Pytorch

Overview

RNNT loss in Pytorch - Numba JIT compiled (warprnnt_numba) Test-CPU

Warp RNN Transducer Loss for ASR in Pytorch, ported from HawkAaron/warp-transducer and a replica of the stable version in NVIDIA Neural Module repository (NVIDIA NeMo).

NOTE: The code here will have experimental extensions and may be potentially unstable, use the version in NeMo for long term supported loss version of RNNT for PyTorch.

Supported Features

Currently supports :

  1. WarpRNNT loss in pytorch for CPU / CUDA (jit compiled)
  2. FastEmit
  3. Gradient Clipping (from Torch Audio)

Installation

You will need PyTorch (usually the latest version should be used), plus installation of Numba in a Conda environment (pip only environment is untested but may work).

# Follow installation instructions to install pytorch from website (with cuda if required)
conda install -c conda-force numba or conda update -c conda-forge numba (to get latest version)

# Then install this library
pip install --upgrade git+https://github.com/titu1994/warprnnt_numba.git

Usage

Import warprnnt_numba and use RNNTLossNumba. If attempting to use CUDA version of loss, it is advisable to test that your installed CUDA version is compatible with numba version using numba_utils.

There is also included a very slow numpy/pytorch explicit-loop based loss implementation for verification of exact correct results.

import torch
import numpy as np
import warprnnt_numba

# Define the loss function
fastemit_lambda = 0.001  # any float >= 0.0
loss_pt = warprnnt_numba.RNNTLossNumba(blank=4, reduction='sum', fastemit_lambda=fastemit_lambda)

# --------------
# Example usage

device = "cuda"
torch.random.manual_seed(0)

# Assume Batchsize=2, Acoustic Timesteps = 8, Label Timesteps = 5 (including BLANK=BOS token),
# and Vocabulary size of 5 tokens (including RNNT BLANK)
acts = torch.randn(2, 8, 5, 5, device=device, requires_grad=True)
sequence_length = torch.tensor([5, 8], dtype=torch.int32,
                               device=device)  # acoustic sequence length. One element must be == acts.shape[1].

# Let 0 be MASK/PAD value, 1-3 be token ids, and 4 represent RNNT BLANK token
# The BLANK token is overloaded for BOS token as well here, but can be different token.
# Let first sample be padded with 0 (actual length = 3). Loss is computed according to supplied `label_lengths`.
# and gradients for the 4th index onwards (0 based indexing).
labels = torch.tensor([[4, 1, 1, 3, 0], [4, 2, 2, 3, 1]], dtype=torch.int32, device=device)
label_lengths = torch.tensor([3, 4], dtype=torch.int32,
                             device=device)  # Lengths here must be WITHOUT the BOS token.

# If on CUDA, log_softmax is computed internally efficiently (preserving memory and speed)
# Compute it explicitly for CPU, this is done automatically for you inside forward() of the loss.
# -1-th vocab index is RNNT blank token here.
loss_func = warprnnt_numba.RNNTLossNumba(blank=4, reduction='none',
                                         fastemit_lambda=0.0, clamp=0.0)
loss = loss_func(acts, labels, sequence_length, label_lengths)
print("Loss :", loss)
loss.sum().backward()

# When parsing the gradients, look at grads[0] -
# Since it was padded in T (sequence_length=5 < T=8), there are gradients only for grads[0, :5, :, :].
# Since it was padded in U (label_lengths=3+1 < U=5), there are gradeints only for grads[0, :5, :3+1, :].
grads = acts.grad
print("Gradients of activations :")
print(grads)

Tests

Tests will perform CPU only checks if there are no GPUs. If GPUs are present, will run all tests once for cuda:0 as well.

pytest tests/

Requirements

  • pytorch >= 1.10. Older versions might work, not tested.
  • numba - Minimum required version is 0.53.0, preferred is 0.54+.
You might also like...
This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

Example repository for custom C++/CUDA operators for TorchScript

Custom TorchScript Operators Example This repository contains examples for writing, compiling and using custom TorchScript operators. See here for the

Convert Python 3 code to CUDA code.

Py2CUDA Convert python code to CUDA. Usage To convert a python file say named py_file.py to CUDA, run python generate_cuda.py --file py_file.py --arch

This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

Demo BERT ONNX pipeline written in rust This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust. R

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA
CUDA Python Low-level Bindings

CUDA Python Low-level Bindings

A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

What Dead simple python wrapper for Yolo V3 using AlexyAB's darknet fork. Works with CUDA 10.1 and OpenCV 4.1 or later (I use OpenCV master as of Jun

Prevent `CUDA error: out of memory` in just 1 line of code.
Prevent `CUDA error: out of memory` in just 1 line of code.

🐨 Koila Koila solves CUDA error: out of memory error painlessly. Fix it with just one line of code, and forget it. 🚀 Features 🙅 Prevents CUDA error

An addernet CUDA version

Training addernet accelerated by CUDA Usage cd adder_cuda python setup.py install cd .. python main.py Environment pytorch 1.10.0 CUDA 11.3 benchmark

Comments
  • GPU under utilization due to low occupancy.

    GPU under utilization due to low occupancy.

    Thank you for the warprnnt_numba, I got the warnning (show blow) when I use this loss in my code. 1650880807(1) Is this known issue? How can it be debugged and solved?

    Thank you!

    opened by jiay7 2
  • Fix runtime speed

    Fix runtime speed

    Improve runtime speed of numba loss

    • Fix issue with data movement of costs tensor from llForward to pytorch data view in numba
    • This alone costs a linear loop (scaling with batch size) that is roughly 10x the kernel costs themselves.
    • Fix by writing a small kernel to copy the data and update the costs.
    opened by titu1994 0
Releases(v0.4.0)
  • v0.4.0(Jan 30, 2022)

    Supports

    • Simple RNNT loss with Atomic Locks implementation

    Improvements

    • Improve runtime speed of numba loss
      • Fix issue with data movement of costs tensor from llForward to pytorch data view in numba
      • This alone costs a linear loop (scaling with batch size) that is roughly 10x the kernel costs themselves.
      • Fix by writing a small kernel to copy the data and update the costs.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Jan 24, 2022)

    Initial release of Warp RNNT loss with Numba JIT compile (CPU/CUDA)

    Supports:

    1. Pytorch RNNT loss (CPU and JIT compiled CUDA)
    2. FastEmit
    3. Gradient clipping
    Source code(tar.gz)
    Source code(zip)
Owner
Somshubra Majumdar
Interested in Machine Learning, Deep Learning and Data Science in general
Somshubra Majumdar
Romanian Automatic Speech Recognition from the ROBIN project

RobinASR This repository contains Robin's Automatic Speech Recognition (RobinASR) for the Romanian language based on the DeepSpeech2 architecture, tog

RACAI 10 Jan 01, 2023
Code and data for the paper "Hearing What You Cannot See"

Hearing What You Cannot See: Acoustic Vehicle Detection Around Corners Public repository of the paper "Hearing What You Cannot See: Acoustic Vehicle D

TU Delft Intelligent Vehicles 26 Jul 13, 2022
Weight estimation in CT by multi atlas techniques

maweight A Python package for multi-atlas based weight estimation for CT images, including segmentation by registration, feature extraction and model

György Kovács 0 Dec 24, 2021
Implementation of our paper "DMT: Dynamic Mutual Training for Semi-Supervised Learning"

DMT: Dynamic Mutual Training for Semi-Supervised Learning This repository contains the code for our paper DMT: Dynamic Mutual Training for Semi-Superv

Zhengyang Feng 120 Dec 30, 2022
Like a cowsay but without cows!

Foxsay This is a simple program that generates pictures of a cute fox with a message. It is like a cowsay but without cows! Fox girls are better! Usag

Anastasia Kim 28 Feb 20, 2022
Real-Time Seizure Detection using EEG: A Comprehensive Comparison of Recent Approaches under a Realistic Setting

Real-Time Seizure Detection using Electroencephalogram (EEG) This is the repository for "Real-Time Seizure Detection using EEG: A Comprehensive Compar

AITRICS 30 Dec 17, 2022
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
Generating Fractals on Starknet with Cairo

StarknetFractals Generating the mandelbrot set on Starknet Current Implementation generates 1 pixel of the fractal per call(). It takes a few minutes

Orland0x 10 Jul 16, 2022
Official Pytorch implementation of "Learning Debiased Representation via Disentangled Feature Augmentation (Neurips 2021, Oral)"

Learning Debiased Representation via Disentangled Feature Augmentation (Neurips 2021, Oral): Official Project Webpage This repository provides the off

Kakao Enterprise Corp. 68 Dec 17, 2022
PyTorch implementation(s) of various ResNet models from Twitch streams.

pytorch-resnet-twitch PyTorch implementation(s) of various ResNet models from Twitch streams. Status: ResNet50 currently not working. Will update in n

Daniel Bourke 3 Jan 11, 2022
A tensorflow model that predicts if the image is of a cat or of a dog.

Quick intro Hello and thank you for your interest in my project! This is the backend part of a two-repo application. The other part can be found here

Tudor Matei 0 Mar 08, 2022
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

Meta Research 5.3k Jan 03, 2023
Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

arXiv Dual Contrastive Learning Adversarial Generative Networks (DCLGAN) We provide our PyTorch implementation of DCLGAN, which is a simple yet powerf

119 Dec 04, 2022
ML-based medical imaging using Azure

Disclaimer This code is provided for research and development use only. This code is not intended for use in clinical decision-making or for any other

Microsoft Azure 68 Dec 23, 2022
BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

weijiawu 47 Dec 26, 2022
Adversarial Learning for Semi-supervised Semantic Segmentation, BMVC 2018

Adversarial Learning for Semi-supervised Semantic Segmentation This repo is the pytorch implementation of the following paper: Adversarial Learning fo

Wayne Hung 464 Dec 19, 2022
This repository collects project-relevant Isabelle/HOL formalizations.

Isabelle/HOL formalizations related to the AuReLeE project Formalization of Abstract Argumentation Frameworks See AbstractArgumentation folder for the

AuReLeE project 1 Sep 10, 2022
A user-friendly research and development tool built to standardize RL competency assessment for custom agents and environments.

Built with ❤️ by Sam Showalter Contents Overview Installation Dependencies Usage Scripts Standard Execution Environment Development Environment Benchm

SRI-AIC 1 Nov 18, 2021
[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction Project Page | Paper | Supplementary | Video This reposit

331 Dec 28, 2022
PyTorch implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

DiscoGAN in PyTorch PyTorch implementation of Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. * All samples in READM

Taehoon Kim 1k Jan 04, 2023