🔊 Audio and fastai v2

Last update: Dec 28, 2022

Overview

Fastaudio

An audio module for fastai v2. We want to help you build audio machine learning applications while minimizing the need for audio domain expertise. Currently under development.

Quick Start

Google Colab Notebook

Zachary Mueller's class

Install

Install using pip:

pip install fastaudio

If you plan on contributing to the library instead, you will need to do a editable install:

# Optional step if using conda
conda create -n fastaudio python=3.7
conda activate fastaudio

# Editable install
git clone https://github.com/fastaudio/fastaudio.git
cd fastaudio
pip install -e .[dev,testing]
pre-commit install

Testing

To run the tests and verify everything is working, run the following command from the fastaudio/ folder (only applicable after doing the editable install steps):

pytest

This will run all of the test suit, reporting if there are any errors and also giving a code coverage report. Adittionally, there are extra checks that pre-commit run automatically every commit to verify the formatting and flake8 violations. If you want to run those manually, the command is pre-commit run

Contributing to the library

We are looking for contributors of all skill levels. If you don't have time to contribute, please at least reach out and give us some feedback on the library.

Make sure that you have activated the environment that you used pre-commit install in so that pre-commit knows where to run the git hooks.

How to contribute

Create issues, write documentation, suggest/add features, submit PRs. We are open to anything. A good first step would be posting in the v2 audio thread introducing yourself.

Note

This project has been set up using PyScaffold 3.2.3. For details and usage information on PyScaffold see https://pyscaffold.org/.

Citation

If you used this library in any research, please cite us.

@misc{coultas_blum_scart_bracco_2020,
 title={Fastaudio},
 url={https://github.com/fastaudio/fastaudio},
 journal={GitHub},
 author={Coultas Blum, Harry A and Scart, Lucas G. and Bracco, Robert},
 year={2020},
 month={Aug}
}

Comments

Porting Transforms to GPU

Porting the transforms to GPU seems in demand. I've been working on some of these and have some proof-of-concept implementations I'd like to upstream - opening a ticket so we have a central place to discuss it.
enhancement

opened by jcaw 20
GPU-Compatible Batch Versions of Existing Transforms
Introduction

This PR adds GPU implementations for the following transforms:

Signal: AddNoise, ChangeVolume, SignalCutout, SignalLoss

Spectrogram: - TfmResize, Delta, MaskFreq, MaskTime

The GPU implementations are currently added alongside the originals, e.g. Delta vs. DeltaGPU. I propose replacing the originals outright where possible, but I've done a more thorough analysis with benchmarking below.

Demos

I set up a Colab notebook with demos of the new transforms here. (It might make sense to turn this into documentation for all transforms at some point.)

Automatic Batching

I've added a wrapper, @auto-batch, that can be added to the encodes method of any batch transform to make it compatible with items too. You just need to specify the number of dimensions in a single item, and when item tensors are received, they will have a dummy batch dimension added for the transform.

As a result, all of these transforms work on both batches and items, with no user intervention.

(The overhead of the wrapper is measured below.)

Changes in Behaviour

Some methods have had their behaviour expanded/altered in the port.

AudioTensor

AddNoise - the GPU version exposes a minimum value for the noise, and allows the transform to be applied to random items. I have seen certain nets degrade when noise is added to all samples, seemingly because they learn to always expect background noise and don't know how to deal with clean samples. Adding noise to a subset of items fixes this, so it seems a sensible addition.

Noise values are also changed so the max & min are relative to one standard deviation of the original tensor, not the range -1 to 1, so the noise level is consistent relative to the average of the signal. This has its own drawbacks (e.g. samples that are mostly silence get much less noise), so I'm not sure which method is better. Let me know which you prefer, I can change it easily.

(Noise can also now be added directly to spectrograms too.)

ChangeVolume - no change.

SignalCutout - a minimum cutout percentage is now exposed.

SignalLoss - no change.

AudioSpectrogram

TfmResize - no change.

Delta - the GPU version now uses torchaudio to compute deltas, and exposes the padding mode.

MaskFreq & MaskTime - the GPU version is modified to more closely match the original SpecAugment paper, with a couple of other additions. A. Masks are now assigned a random width (within a specified range). B. The replacement value is now the mean of the masked area, to avoid changing the spectrogram's overall mean (although the standard deviation will still be affected). This can be overridden by specifying a mask_val. C. You can no longer specify where you want the mask(s) to start. It's always random. D. One objective of SpecAugment masking seems to be encouraging the network to learn how to look at parts of the spectrogram it would otherwise ignore. The same mask will now span across all channels, to ensure the net does not avoid this by inferring the missing information from same region in another channel (which is likely to be quite similar).

Benchmarks

I benchmarked the new transforms on two boxes:

Local - Nvidia GTX 970, i5-4590 (4 cores, 3.30GHz)

Colab - Tesla T4, Intel Xeon (2 cores, 2.20GHz)

Results are presented for both. Benchmarks are repeated 1000 times on the Colab box and 100 times locally, except for the batch_size=64 tests, which are repeated 250 times on the Colab box and 25 times locally. These benchmarks are on the plain Python versions of the transforms, I haven't compiled them to torchscript. Let me know how you'd like me to interact with torchscript and whether you'd like me to benchmark that too.

Some results for the replacement delta transforms are missing on GPU due to an upstream bug affecting large tensors. It gets tripped due to the way torchaudio packs spectrograms for the delta method. This bug might not fire on newer cards (it may to be related to the maximum CUDA block size). I can pull DeltaGPU out into a separate PR if you'd like to wait until the upstream issues are fixed.

Old vs. New Implementations

I compared the execution speed on CPU between the old and new methods to establish which of the new methods add unacceptable overhead and which should replace the old implementations. Benchmarking script here.

Results are split between AudioTensor and AudioSpectrogram objects. These operations are performed on single items with no batch dimension.

AudioTensor

Colab (Xeon, x2 @ 2.20 GHz)

Local (i5-4590, x4 @ 3.30 GHz)

Conclusion

Based on these results I propose:

AddNoise - replace. The GPU-compatible version seems to have similar overhead, but does more.

ChangeVolume - replace. This method is so fast to begin with that the loss of efficiency is not likely to be significant relative to the entire pipeline. Auto-batching may also be responsible for a chunk of this.

SignalCutout - undecided. The GPU-compatible version is slower, but also allows a minimum cut percentage to be specified. If the original is kept, I think it should also add that.

SignalLoss - replace. The additional overhead of the GPU version appears minimal. I propose replacing for cleanliness.

AudioSpectrogram

Colab (Xeon, 2 Cores @ 2.20 GHz)

Local (i5-4590, 4 Cores @ 3.30 GHz)

Conclusion

TfmResize - replace. The new implementation is comparable.

Delta - replace. The new implementation is much faster.

MaskTime - replace. The new implementation is comparable but does a lot more.

MaskFreq - replace. The new implementation is much slower but does a lot more. Interestingly, this shows the overhead of the conversion operations in the original MaskTime implementation - they dwarf the underlying transform.

GPU Performance

I've also benchmarked the performance on GPU, to give an idea of how the new implementations scale and the relative overhead of the different operations.

I'm suspicious of the CPU vs. GPU results on the GTX 970 box. I think the GPU should be performing dramatically better (those that I've used in a real training loop have negligible overhead compared to their CPU counterparts), so there may be a problem in the benchmarking script. I believe the 970 also has strange memory characteristics that mean the upper portion of its VRAM is slow compared to the rest - I don't know if that would be affecting things.

GPU Only, Various Batch Sizes

This is just to illustrate how each transform scales. Some transforms at larger batch sizes are missing due to CUDA memory errors.

Tesla T4

GTX 970

GPU vs. CPU, Batch Size = 32

Tesla T4

GTX 970

GPU vs. CPU, Batch Size = 64

Tesla T4

GTX 970

Automatic Batching

I've measured two dummy transforms that do nothing - one with the @auto_batch wrapper and one without. The wrapper is very cheap, adding minimal overhead.

Colab (Tesla, Xeon)

Local (970, i5-4590)

Tests

Tests are not final. I've currently just switched them over to the GPU versions of the transforms. Once a final set of transforms has been decided, I can concretize the tests.

Docstrings

Docstrings aren't final. I'll write them once the code is finalised. Let me know what format you'd prefer.

FastAI API Integration

I've currently set the transforms up as subclasses of the basic Transform class, but this might not be ideal. I'm not familiar enough with the subclasses of Transform to know if this is correct. Perhaps DisplayedTransform would be preferable?

Conclusion

Let me know whether these implementations are acceptable and which of the original transforms you would like to keep/discard. I think it makes sense to merge these here initially, then I can look at upstreaming relevant transforms (E.g. the new SpecAugment masking implementation) into torchaudio or torch-audiomentations.
enhancement released
opened by jcaw 11

AudioSpectrogram failed to be plotted under ClassificationInterpretation

I have made a public kaggle kernel for reproducing the bug as your reference (look at session 3. Failed to Show AudioSpectrogram for the main point): https://www.kaggle.com/alexlwh/failed-to-show-audiospec-tensor

Essentially, I tried to show the predictions v.s. ground truth using ClassificationInterpretation as follows:

interp = ClassificationInterpretation.from_learner(learner)

@patch
def __getitem__(self: Interpretation, idxs):
    if not is_listy(idxs):
        idxs = [idxs]
    attrs = 'inputs,preds,targs,decoded,losses'
    res = L([getattr(self, attr)[idxs] for attr in attrs.split(',')])
    return res

@patch
@delegates(TfmdDL.show_results)
def show_at(self: Interpretation, idxs, **kwargs):
    inp, _, targ,dec, _ = self[idxs]
    self.dl.show_results((inp, dec), targ, **kwargs)

interp.show_at(0)

And then I got the following error:

/opt/conda/lib/python3.7/site-packages/fastaudio/core/spectrogram.py in show(self, ctx, ax, title, **kwargs)
     75     def show(self, ctx=None, ax=None, title="", **kwargs):
     76         "Show spectrogram using librosa"
---> 77         return show_spectrogram(self, ctx=ctx, ax=ax, title=title, **kwargs)
     78 
     79 

/opt/conda/lib/python3.7/site-packages/fastaudio/core/spectrogram.py in show_spectrogram(sg, title, ax, ctx, **kwargs)
     87         ia = ax.inset_axes((i / sg.nchannels, 0.2, 1 / sg.nchannels, 0.7))
     88         z = specshow(
---> 89             channel.cpu().numpy(), ax=ia, **sg._all_show_args(show_y=i == 0), **kwargs
     90         )
     91         ia.set_title(f"Channel {i}")

/opt/conda/lib/python3.7/site-packages/librosa/display.py in specshow(data, x_coords, y_coords, x_axis, y_axis, sr, hop_length, fmin, fmax, tuning, bins_per_octave, key, Sa, mela, thaat, ax, **kwargs)
    843     # Get the x and y coordinates
    844     y_coords = __mesh_coords(y_axis, y_coords, data.shape[0], **all_params)
--> 845     x_coords = __mesh_coords(x_axis, x_coords, data.shape[1], **all_params)
    846 
    847     axes = __check_axes(ax)

/opt/conda/lib/python3.7/site-packages/librosa/display.py in __mesh_coords(ax_type, coords, n, **kwargs)
    915     if ax_type not in coord_map:
    916         raise ParameterError("Unknown axis type: {}".format(ax_type))
--> 917     return coord_map[ax_type](n, **kwargs)
    918 
    919 

/opt/conda/lib/python3.7/site-packages/librosa/display.py in __coord_time(n, sr, hop_length, **_kwargs)
   1194 def __coord_time(n, sr=22050, hop_length=512, **_kwargs):
   1195     """Get time coordinates from frames"""
-> 1196     return core.frames_to_time(np.arange(n + 1), sr=sr, hop_length=hop_length)

/opt/conda/lib/python3.7/site-packages/librosa/core/convert.py in frames_to_time(frames, sr, hop_length, n_fft)
    186     samples = frames_to_samples(frames, hop_length=hop_length, n_fft=n_fft)
    187 
--> 188     return samples_to_time(samples, sr=sr)
    189 
    190 

/opt/conda/lib/python3.7/site-packages/librosa/core/convert.py in samples_to_time(samples, sr)
    304     """
    305 
--> 306     return np.asanyarray(samples) / float(sr)
    307 
    308 

TypeError: float() argument must be a string or a number, not 'NoneType'

I suspect it is because at some point the AudioSpectrogram.sr failed to be propagated to ClassificationInterpretation, rendering interp.inputs.sr = None. (Not sure exactly at which point the issue happens yet):

interp.inputs._all_show_args()
>>
{'x_coords': None,
 'y_coords': None,
 'x_axis': 'time',
 'y_axis': 'mel',
 'sr': None,
 'hop_length': 1024,
 'fmin': None,
 'fmax': None,
 'tuning': 0.0,
 'bins_per_octave': 12,
 'key': 'C:maj',
 'Sa': None,
 'mela': None,
 'thaat': None,
 'cmap': 'viridis'}

bug released

opened by riven314 9

Test Coverage Improvement
Various improvements to the tests

Closes #18

Removal of the need to download data for most of the tests. We are now using a util method to generate sin waves.

released
opened by mogwai 9
Cannot install fastai with fastaudio
Hi. fastaudio with every update requires a specific version of fastai. This causes a problem each time if I install like this:

pip install fastai==2.1.18 git+https://github.com/fastaudio/fastaudio.git

Because currently fastaudio works with fastai 2.1.5

Is there a recommended way to install it? (without a separate installation process)
question
opened by turgut090 8
Limit show batch figure count

If the batch_size param is rather high (in my case it was 512), then show_batch() plots almost unusable plots.

The suggestion would be to plot first n plots with an ability to override that.
enhancement

opened by dvisockas 6
chore: Formatting notebooks

This pull request has the objective to clean small formatting problems on the tutorial notebooks so they render better on the docs page, and also remove some references to things from fastai v1 that don't make sense anymore, like using the full 10 speakers dataset when we only need one audio file.

The only code change was updating the link of 10 speakers sample. As both it and ESC-50 downloaded a file called master.zip, there could be problems where both files got mixed in the fastai cache and you end up with the wrong dataset.

For the notebook changes:

ESC-50: * Removed empty cells * Save output of show_batch and the training so users know what to expect while reading the tutorial * Fix headings so Table of Contents is rendered correctly

Training tutorial: * Removed, it's exactly the same file as ESC50

Introduction to fastaudio: * Change data to 10 speakers sample from full 10 speakers

Introduction to audio: * Removed Table of contents, as the docs page creates one * Fix headings * Remove reference to notebooks that don't exist * Change data to 10 speakers sample from full 10 speakers
released

opened by scart97 5
Pypi release
Fixed problem while testing notebooks, where the latest version from master would be installed during the testing potentially breaking the tests.

Reverted back to the package inside the src/ folder

Added a new action to publish on pypi every new release created. This was tested on a fork to work correctly: https://github.com/scart97/fastaudio/releases/tag/v1.0.0 https://test.pypi.org/project/fastaudio/1.0.0/

released
opened by scart97 4
chore: Consolidate docs
Changes:

Repository readme.md and docs index page now point to the same file, so they are equal

The learning resources present on the wiki have been moved inside the docs instead

Small changes to the wording in the README and fix broken colab link

released
opened by scart97 4
AttributeError: 'Axes' object has no attribute 'get_array' in Training_tutorial.ipynb

I'm getting AttributeError: 'Axes' object has no attribute 'get_array' when executing dbunch.show_batch(figsize=(10, 5)) in the Training_tutorial.ipynb file.

bug

opened by TannerGilbert 4
Change AddNoise to RandTransform

Address to https://github.com/fastaudio/fastaudio/issues/101 Additionally, I have to add .pre-commit-config.yaml to be able to commit. Not sure if its correct
released

opened by riven314 3
pip install doesn't install the correct versions of fastai

I see that fastai == 2.3.1 but when I run pip install fastaudio, fastai version 2.1.9 and fastaudio 0.1.4 gets installed and I don't understand why. It also happens when I have it in a requirements document and try to create a docker container.

This leads to an error when I do from **from fastai.vision.all import ***

When I try to force pip install fastaudio==1.0.2 I get

#16 1.136 ERROR: Could not find a version that satisfies the requirement torchaudio<0.9,>=0.7 (from fastaudio) (from versions: 0.10.0, 0.10.1, 0.10.2, 0.11.0, 0.12.0, 0.12.1, 0.13.0) #16 1.136 ERROR: No matching distribution found for torchaudio<0.9,>=0.7

Any ideas to help fix this would be appreciated :)

opened by onedeeper 0
Installing fastaudio breaks fastai/ pytorch

Fastai and fast core is currently at 2.5.3 and 1.3.27. Installing fastaudio breaks fastai, i guess that has to do with pytorch beeing downgraded.

Is there any reasons for the hard version requirements?

install_requires = fastai==2.3.1 torchaudio>=0.7,<0.9 librosa==0.8 colorednoise>=1.1 IPython #Temporary remove the bound on IPython fastcore==1.3.20

opened by voibit 2
Need direction upgrade Fastaudio for Raspberry Pi 4 support (ARM64)

Hello,

On aarch64 I got this error during inference: .... RuntimeError: fft: ATen not compiled with MKL support .... The issue is addressed in the Torch 1.10.0 release: https://github.com/pytorch/pytorch/releases/tag/v1.10.0

Could someone,please, give direction on how to upgrade Fastaudio to support the framework (Torchaudion 0.10.0, ...) coming with Torch 1.10.0

Another question: is there any potential issue in using the latest FastCore/FastAi version instead of 1.3.20/2.3.1?

Thanks,

Victor

opened by WorkingClass 0
Model gets stuck during training

I am trying to run a FastAudio model with some downloaded song birds audio files found on the internet. When I trained it with the 10 most common species species, it worked fine. But when I downloaded more files, something happens and makes the model stop without crashing. It happens then I use de lr_find() method and fine_tune. It just gets stuck. I checked (with pydub and librosa) if I had any corrupted files, but couldn't find any. Does anyone know what I can do to overcome this? Maybe a different check for corrupted files? Thank you in advance

opened by ffreller 0
Windowing operation on spectrograms

I am working on cough detection, and using COUGHVID dataset, where most of the audio files are of 9-sec length, but the range is from 1 sec to 9 sec Also the cough score is from 0.0 to 1.0, and no particular threshold can be used, as there are cases where there is no cough in a sample with a score of around 0.3, but the mild cough is there with score 0.2.

Now I want to keep my spectrogram window around 2-3 sec, how to use windowing in fastaudio? As once I will be using 2-3 sec clips then it is going to pollute labels, as may be in only one clip cough would be there, so kindly suggest to me what you think of it. And last but not the least, how to use label-smoothing in fastaudio? Thanks...

opened by m-ali-awan 0
Frequency max in audio_spec.show() not aligned with AudioSpectrogram's metadata?

I noticed the frequency range (y-axis) of the displayed plot for audio_spec.show() is fixed regardless of the sampling rate/ f_max from audio_spec or SpectrogramTransform (i.e. its always capped at ~8192 Hz in y-axis).
I think the frequency range should be dependent on these 2 parameters.

Below kernel illustrates the issue and attaches my proposed solution, see if it makes sense. If so, I can raise a PR for that (my proposed solution is written as a patching, but in my PR, I will change the function in place): https://www.kaggle.com/alexlwh/rfcx-tmp?scriptVersionId=64349824
bug

opened by riven314 0

Releases(v1.0.2)

v1.0.2(Jun 9, 2021)
1.0.2 (2021-06-09)

Bug Fixes

Change AddNoise to RandTransform (#102) (e7a9caa)

Source code(tar.gz)
Source code(zip)
v1.0.1(May 25, 2021)
1.0.1 (2021-05-25)

Bug Fixes

update fastai to 2.3 (#99) (c01b9d0)

Source code(tar.gz)
Source code(zip)
v1.0.0(May 23, 2021)
1.0.0 (2021-05-23)

BREAKING CHANGE: GPU-Compatible batch transforms (#85) (38b4534), closes #85

BREAKING CHANGES

GPU-Compatible batch transforms (#85)

Source code(tar.gz)
Source code(zip)
v0.1.6(May 22, 2021)
0.1.6 (2021-05-22)

Bug Fixes

Fix issue with multiprocessing not keeping the Tensor metadata (#97) (a993533)

Source code(tar.gz)
Source code(zip)
v0.1.5(Mar 12, 2021)
0.1.5 (2021-03-12)

Bug Fixes

Update fastai (#89) (7d30fc7)

Source code(tar.gz)
Source code(zip)
v0.1.4(Jan 4, 2021)
0.1.4 (2021-01-04)

Bug Fixes

Upgrade to latest fastai version (#79) (e3fc388)

Source code(tar.gz)
Source code(zip)
v0.1.3(Dec 13, 2020)
0.1.3 (2020-12-13)

Bug Fixes

change linux backend to fix mp3 load bug (#77) (8189406)

Source code(tar.gz)
Source code(zip)
v0.1.2(Dec 8, 2020)
0.1.2 (2020-12-08)

Bug Fixes

change GITHUB_TOKEN to PAT so that the action triggers (fb30d65)

Source code(tar.gz)
Source code(zip)
v0.1.1(Dec 8, 2020)
0.1.1 (2020-12-08)

Bug Fixes

Removing version.txt from release (#74) (c30bde4)

Source code(tar.gz)
Source code(zip)
v0.1.0(Dec 2, 2020)
0.1.0 (2020-12-02)

Features

Automatic Versioning CI (c2b881a)

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository https://fastaudio.github.io

Source code for our paper "Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures"

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures Code for the Multiplex Molecular Graph Neural Network (M

59 Dec 10, 2022

Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

116 Dec 20, 2022

Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models

Cross-framework Python Package for Evaluation of Latent-based Generative Models Latte Latte (for LATent Tensor Evaluation) is a cross-framework Python

30 Sep 08, 2022

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

249 Dec 22, 2022

Live training loss plot in Jupyter Notebook for Keras, PyTorch and others

livelossplot Don't train deep learning models blindfolded! Be impatient and look at each epoch of your training! (RECENT CHANGES, EXAMPLES IN COLAB, A

1.2k Jan 08, 2023

Dataset for the Research2Clinics @ NeurIPS 2021 Paper: What Do You See in this Patient? Behavioral Testing of Clinical NLP Models

Behavioral Testing of Clinical NLP Models This repository contains code for testing the behavior of clinical prediction models based on patient letter

2 Sep 20, 2022

[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition

CaaM This repo contains the codes of training our CaaM on NICO/ImageNet9 dataset. Due to my recent limited bandwidth, this codebase is still messy, wh

66 Dec 31, 2022

WSDM2022 "A Simple but Effective Bidirectional Extraction Framework for Relational Triple Extraction"

BiRTE WSDM2022 "A Simple but Effective Bidirectional Extraction Framework for Relational Triple Extraction" Requirements The main requirements are: py

9 Dec 27, 2022

Single Image Deraining Using Bilateral Recurrent Network (TIP 2020)

Single Image Deraining Using Bilateral Recurrent Network Introduction Single image deraining has received considerable progress based on deep convolut

23 Aug 10, 2022

Official implementation of "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers"

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers Figure 1: Performance of SegFormer-B0 to SegFormer-B5. Project page

1.4k Dec 31, 2022

A basic neural network for image segmentation.

Unet_erythema_detection A basic neural network for image segmentation. 前期准备 1.在logs文件夹中下载h5权重文件，百度网盘链接在logs文件夹中 2.将所有原图放置在“/dataset_1/JPEGImages/”文件夹

1 Jan 16, 2022

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

SF-Net for fullband SE This is the repo of the manuscript "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Ban

36 Dec 02, 2022

Bianace Prediction Pytorch Model

Bianace Prediction Pytorch Model Main Results ETHUSDT from 2021-01-01 00:00:00 t

4 Jul 20, 2022

Fake videos detection by tracing the source using video hashing retrieval.

Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos 🎉️ 📜 Directory Introduction VTL Trace Samples and Acc of Hash

56 Dec 22, 2022

This repo contains implementation of different architectures for emotion recognition in conversations.

Emotion Recognition in Conversations Updates 🔥 🔥 🔥 Date Announcements 03/08/2021 🎆 🎆 We have released a new dataset M2H2: A Multimodal Multiparty

1k Dec 30, 2022

Official code repository for Continual Learning In Environments With Polynomial Mixing Times

Official code for Continual Learning In Environments With Polynomial Mixing Times Continual Learning in Environments with Polynomial Mixing Times This

1 Dec 19, 2021

Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

LiDAR-MOS: Moving Object Segmentation in 3D LiDAR Data This repo contains the code for our paper: Moving Object Segmentation in 3D LiDAR Data: A Learn

394 Dec 29, 2022

CLIP+FFT text-to-image

Aphantasia This is a text-to-image tool, part of the artwork of the same name. Based on CLIP model, with FFT parameterizer from Lucent library as a ge

690 Jan 02, 2023

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, L

3 Dec 02, 2022

OpenCV, MediaPipe Pose Estimation, Affine Transform for Icon Overlay

Yoga Pose Identification and Icon Matching Project Goal Detect yoga poses performed by a user and overlay a corresponding icon image. Running the main

1 Dec 03, 2021

🔊 Audio and fastai v2

Related tags

Overview

Fastaudio

Quick Start

Install

Testing

Contributing to the library

How to contribute

Note

Citation

Comments

Introduction

Demos

Automatic Batching

Changes in Behaviour

AudioTensor

AudioSpectrogram

Benchmarks

Old vs. New Implementations

AudioTensor

Colab (Xeon, x2 @ 2.20 GHz)

Local (i5-4590, x4 @ 3.30 GHz)

Conclusion

AudioSpectrogram

Colab (Xeon, 2 Cores @ 2.20 GHz)

Local (i5-4590, 4 Cores @ 3.30 GHz)

Conclusion

GPU Performance

GPU Only, Various Batch Sizes

Tesla T4

GTX 970

GPU vs. CPU, Batch Size = 32

Tesla T4

GTX 970

GPU vs. CPU, Batch Size = 64

Tesla T4

GTX 970

Automatic Batching

Colab (Tesla, Xeon)

Local (970, i5-4590)

Tests

Docstrings

FastAI API Integration

Conclusion

Releases(v1.0.2)

v1.0.2(Jun 9, 2021)

1.0.2 (2021-06-09)

Bug Fixes

v1.0.1(May 25, 2021)

1.0.1 (2021-05-25)

Bug Fixes

v1.0.0(May 23, 2021)

1.0.0 (2021-05-23)

BREAKING CHANGES

v0.1.6(May 22, 2021)

0.1.6 (2021-05-22)

Bug Fixes

v0.1.5(Mar 12, 2021)

0.1.5 (2021-03-12)

Bug Fixes

v0.1.4(Jan 4, 2021)

0.1.4 (2021-01-04)

Bug Fixes

v0.1.3(Dec 13, 2020)

0.1.3 (2020-12-13)

Bug Fixes

v0.1.2(Dec 8, 2020)

0.1.2 (2020-12-08)

Bug Fixes

v0.1.1(Dec 8, 2020)

0.1.1 (2020-12-08)

Bug Fixes

v0.1.0(Dec 2, 2020)

0.1.0 (2020-12-02)

Features

Owner

Source code for our paper "Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures"

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models