Localizing-Visual-Sounds-the-Hard-Way

Code and Dataset for "Localizing Visual Sounds the Hard Way".

The repo contains code and our pre-trained model.

Environment

Python 3.6.8
Pytorch 1.3.0

Flickr-SoundNet

We provide the pretrained model here.

To test the model, testing data and ground truth should be downloaded from learning to localize sound source.

Then run

python test.py --data_path "path to downloaded data with structure below/" --summaries_dir "path to pretrained models" --gt_path "path to ground truth" --testset "flickr"

VGG-Sound Source

We provide the pretrained model here.

To test the model, run

python test.py --data_path "path to downloaded data with structure below/" --summaries_dir "path to pretrained models" --testset "vggss"

(Note, some gt bounding boxes are updated recently, all results on VGG-SS cause a 2~3% difference on IoU.)

Both test data should be placed in the following structure.

data path
│
└───frames
│   │   image001.jpg
│   │   image002.jpg
│   │
└───audio
    │   audio011.wav
    │   audio012.wav

Citation

@InProceedings{Chen21,
              title        = "Localizing Visual Sounds the Hard Way",
              author       = "Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman",
              booktitle    = "CVPR",
              year         = "2021"}

Localizing Visual Sounds the Hard Way

Related tags

Overview

Localizing-Visual-Sounds-the-Hard-Way

Environment

Flickr-SoundNet

VGG-Sound Source

Citation

Owner

Honglie Chen

HackBMU-5.0-Team-Ctrl-Alt-Elite - HackBMU 5.0 Team Ctrl Alt Elite

Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis

Object detection on multiple datasets with an automatically learned unified label space.

PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV @ CVPR 2021.

Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"

Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

Use graph-based analysis to re-classify stocks and to improve Markowitz portfolio optimization

Code for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss"

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

Underwater industrial application yolov5m6

Automatic learning-rate scheduler

Vehicle speed detection with python

This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks

TensorFlow implementation of Elastic Weight Consolidation

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

🙄 Difficult algorithm, Simple code.