Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Last update: Dec 07, 2022

Overview

Auditory Slow-Fast

This repository implements the model proposed in the paper:

Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021

Project's webpage

arXiv paper

Citing

When using this code, kindly reference:

@ARTICLE{Kazakos2021SlowFastAuditory,
   title={Slow-Fast Auditory Streams For Audio Recognition},
   author={Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima},
           journal   = {CoRR},
           volume    = {abs/2103.03516},
           year      = {2021},
           ee        = {https://arxiv.org/abs/2103.03516},
}

Pretrained models

You can download our pretrained models on VGG-Sound and EPIC-KITCHENS-100:

Slow-Fast (EPIC-KITCHENS-100) link
Slow (EPIC-KITCHENS-100) link
Fast (EPIC-KITCHENS-100) link
Slow-Fast (VGG-Sound) link
Slow (VGG-Sound) link
Fast (VGG-Sound) link

Preparation

Requirements:
- PyTorch 1.7.1
- librosa: conda install -c conda-forge librosa
- h5py: conda install h5py
- wandb: pip install wandb
- fvcore: pip install 'git+https://github.com/facebookresearch/fvcore'
- simplejson: pip install simplejson
- psutil: pip install psutil
- tensorboard: pip install tensorboard
Add this repository to $PYTHONPATH.

export PYTHONPATH=/path/to/auditory-slow-fast/slowfast:$PYTHONPATH

VGG-Sound:
1. Download the audio. For instructions see here
2. Download train.pkl (link) and test.pkl (link). I converted the original train.csv and test.csv (found here) to pickle files with column names for easier use
EPIC-KITCHENS:
1. From the annotation repository of EPIC-KITCHENS-100 (link), download: EPIC_100_train.pkl, EPIC_100_validation.pkl, and EPIC_100_test_timestamps.pkl. EPIC_100_train.pkl and EPIC_100_validation.pkl will be used for training/validation, while EPIC_100_test_timestamps.pkl can be used to obtain the scores to submit in the AR challenge.
2. Download all the videos of EPIC-KITCHENS-100 using the download scripts found here, where you can also find detailed instructions on using the scripts.
3. Extract audio from the videos by running:
```
python audio_extraction/extract_audio.py /path/to/videos /output/path 
```
1. Save audio in HDF5 format by running:
```
python audio_extraction/wav_to_hdf5.py /path/to/audio /output/hdf5/EPIC-KITCHENS-100_audio.hdf5
```

Training/validation on EPIC-KITCHENS-100

To train the model run (fine-tuning from VGG-Sound pretrained model):

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To train from scratch remove TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model.

You can also train the individual streams. For example, for training Slow run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOW_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To validate the model run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

To obtain scores on the test set run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth 
EPICKITCHENS.TEST_LIST EPIC_100_test_timestamps.pkl EPICKITCHENS.TEST_SPLIT test

Training/validation on VGG-Sound

To train the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations

To validate the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

License

The code is published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, found here.

Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Related tags

Overview

Auditory Slow-Fast

Citing

Pretrained models

Preparation

Training/validation on EPIC-KITCHENS-100

Training/validation on VGG-Sound

License

Owner

Evangelos Kazakos

Synthesia but open source, made in python and free

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Expressive Digital Signal Processing (DSP) package for Python

Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

pyo is a Python module written in C to help digital signal processing script creation.

A small project where I identify notes and key harmonies in a piece of music and use them further to recreate and generate the same piece of music through Python

A collection of free MIDI chords and progressions ready to be used in your DAW, Akai MPC, or Roland MC-707/101

Hide Your Secret Message in any Wave Audio File.

Audio processor to map oracle notes in the VoG raid in Destiny 2 to call outs.

This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz

This is my voice assistant Patric!

🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

The official repository for Audio ALBERT

Make an audio file (really) long-winded

Noinoi music is smoothly playing music on voice chat of telegram.

A Youtube audio player for your terminal

Spotifyd - An open source Spotify client running as a UNIX daemon.

Python tools for the corpus analysis of popular music.

XA Music Player - Telegram Music Bot