Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

Overview

[Paper] [Project page]

This repository contains code for the paper:

Andrew Owens, Alexei A. Efros. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features. arXiv, 2018

Contents

This release includes code and models for:

  • On/off-screen source separation: separating the speech of an on-screen speaker from background sounds.
  • Blind source separation: audio-only source separation using u-net and PIT.
  • Sound source localization: visualizing the parts of a video that correspond to sound-making actions.
  • Self-supervised audio-visual features: a pretrained 3D CNN that can be used for downstream tasks (e.g. action recognition, source separation).

Setup

pip install tensorflow     # for CPU evaluation only
pip install tensorflow-gpu # for GPU support

We used TensorFlow version 1.8, which can be installed with:

pip install tensorflow-gpu==1.8
  • Install other python dependencies
pip install numpy matplotlib pillow scipy
  • Download the pretrained models and sample data
./download_models.sh
./download_sample_data.sh

Pretrained audio-visual features

We have provided the features for our fused audio-visual network. These features were learned through self-supervised learning. Please see shift_example.py for a simple example that uses these pretrained features.

Audio-visual source separation

To try the on/off-screen source separation model, run:

python sep_video.py ../data/translator.mp4 --model full --duration_mult 4 --out ../results/

This will separate a speaker's voice from that of an off-screen speaker. It will write the separated video files to ../results/, and will also display them in a local webpage, for easier viewing. This produces the following videos (click to watch):

Input On-screen Off-screen

We can visually mask out one of the two on-screen speakers, thereby removing their voice:

python sep_video.py ../data/crossfire.mp4 --model full --mask l --out ../results/
python sep_video.py ../data/crossfire.mp4 --model full --mask r --out ../results/

This produces the following videos (click to watch):

Source Left Right

Blind (audio-only) source separation

This baseline trains a u-net model to minimize a permutation invariant loss.

python sep_video.py ../data/translator.mp4 --model unet_pit --duration_mult 4 --out ../results/

The model will write the two separated streams in an arbitrary order.

Visualizing the locations of sound sources

To view the self-supervised network's class activation map (CAM), use the --cam flag:

python sep_video.py ../data/translator.mp4 --model full --cam --out ../results/

This produces a video in which the CAM is overlaid as a heat map:

Action recognition and fine-tuning

We have provided example code for training an action recognition model (e.g. on the UCF-101 dataset) in videocls.py). This involves fine-tuning our pretrained, audio-visual network. It is also possible to train this network with only visual data (no audio).

Citation

If you use this code in your research, please consider citing our paper:

@article{multisensory2018,
  title={Audio-Visual Scene Analysis with Self-Supervised Multisensory Features},
  author={Owens, Andrew and Efros, Alexei A},
  journal={arXiv preprint arXiv:1804.03641},
  year={2018}
}

Updates

  • 11/08/18: Fixed a bug in the class activation map example code. Added Tensorflow 1.9 compatibility.

Acknowledgements

Our u-net code draws from this implementation of pix2pix.

Semi-Supervised Signed Clustering Graph Neural Network (and Implementation of Some Spectral Methods)

SSSNET SSSNET: Semi-Supervised Signed Network Clustering For details, please read our paper. Environment Setup Overview The project has been tested on

Yixuan He 9 Nov 24, 2022
Neighborhood Contrastive Learning for Novel Class Discovery

Neighborhood Contrastive Learning for Novel Class Discovery This repository contains the official implementation of our paper: Neighborhood Contrastiv

Zhun Zhong 56 Dec 09, 2022
Code for ICCV2021 paper PARE: Part Attention Regressor for 3D Human Body Estimation

PARE: Part Attention Regressor for 3D Human Body Estimation [ICCV 2021] PARE: Part Attention Regressor for 3D Human Body Estimation, Muhammed Kocabas,

Muhammed Kocabas 277 Jan 03, 2023
Implementation of paper "Graph Condensation for Graph Neural Networks"

GCond A PyTorch implementation of paper "Graph Condensation for Graph Neural Networks" Code will be released soon. Stay tuned :) Abstract We propose a

Wei Jin 66 Dec 04, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Jan 05, 2023
AntroPy: entropy and complexity of (EEG) time-series in Python

AntroPy is a Python 3 package providing several time-efficient algorithms for computing the complexity of time-series. It can be used for example to e

Raphael Vallat 153 Dec 27, 2022
Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Towards End-to-End Image Compression and Analysis with Transformers Source code of our AAAI 2022 paper "Towards End-to-End Image Compression and Analy

37 Dec 21, 2022
Retrieval.pytorch - The code we used in [2020 DIGIX]

Retrieval.pytorch - The code we used in [2020 DIGIX]

Guo-Hua Wang 2 Feb 07, 2022
Re-TACRED: Addressing Shortcomings of the TACRED Dataset

Re-TACRED Re-TACRED: Addressing Shortcomings of the TACRED Dataset

George Stoica 40 Dec 10, 2022
Joint deep network for feature line detection and description

SOLD² - Self-supervised Occlusion-aware Line Description and Detection This repository contains the implementation of the paper: SOLD² : Self-supervis

Computer Vision and Geometry Lab 427 Dec 27, 2022
Patches desktop steam to look like the new steamdeck ui.

steam_deck_ui_patch The Deck UI patch will patch the regular desktop steam to look like the brand new SteamDeck UI. This patch tool currently works on

The_IT_Dude 3 Aug 29, 2022
Official implementation for “Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior”

Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior. The code will release soon. Implementation Python3 PyTorch=1.0 NVIDIA GPU+

FengZhang 34 Dec 04, 2022
Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"

Status: Archive (code is provided as-is, no updates expected) InfoGAN Code for reproducing key results in the paper InfoGAN: Interpretable Representat

OpenAI 1k Dec 19, 2022
Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

Chord Recognition Demo application The demo application is written in C# with .NETCore. As of July 9, 2020, the only version available is for windows

Andres Mauricio Rondon Patiño 24 Oct 22, 2022
Code of TIP2021 Paper《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet and Pytorch versions.

SFace Code of TIP2021 Paper 《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet, PyTorch and Jittor versi

Zhong Yaoyao 47 Nov 25, 2022
Stereo Hybrid Event-Frame (SHEF) Cameras for 3D Perception, IROS 2021

For academic use only. Stereo Hybrid Event-Frame (SHEF) Cameras for 3D Perception Ziwei Wang, Liyuan Pan, Yonhon Ng, Zheyu Zhuang and Robert Mahony Th

Ziwei Wang 11 Jan 04, 2023
Code for the paper "TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks"

TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks This is a Python3 / Pytorch implementation of TadGAN paper. The associated

Arun 92 Dec 03, 2022
OptaPlanner wrappers for Python. Currently significantly slower than OptaPlanner in Java or Kotlin.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference S

OptaPy 211 Jan 02, 2023
SPTAG: A library for fast approximate nearest neighbor search

SPTAG: A library for fast approximate nearest neighbor search SPTAG SPTAG (Space Partition Tree And Graph) is a library for large scale vector approxi

Microsoft 4.3k Jan 01, 2023
Awesome Graph Classification - A collection of important graph embedding, classification and representation learning papers with implementations.

A collection of graph classification methods, covering embedding, deep learning, graph kernel and factorization papers

Benedek Rozemberczki 4.5k Jan 01, 2023