ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.

Related tags

Deep Learningpytorch
Overview

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning

This repository contains the code for our ICCV 2021 paper:

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
Sangho Lee*, Jiwan Chung*, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song (*: equal contribution)
[paper]

@inproceedings{lee2021acav100m,
    title="{ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning}",
    author={Sangho Lee and Jiwan Chung and Youngjae Yu and Gunhee Kim and Thomas Breuel and Gal Chechik and Yale Song},
    booktitle={ICCV},
    year=2021
}

System Requirements

  • Python >= 3.8.5
  • FFMpeg 4.3.1

Installation

  1. Install PyTorch 1.6.0, torchvision 0.7.0 and torchaudio 0.6.0 for your environment. Follow the instructions in HERE.

  2. Install the other required packages.

pip install -r requirements.txt
python -m nltk.downloader 'punkt'
pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/<cuda version>/torch1.6/index.html
pip install git+https://github.com/jiwanchung/slowfast
pip install torch-scatter==2.0.5 -f https://pytorch-geometric.com/whl/torch-1.6.0+<cuda version>.html

e.g. Replace <cuda version> with cu102 for CUDA 10.2.

Input File Structure

  1. Create the data directory
mkdir data
  1. Prepare the input file.

data/metadata.tsv should be structured as follows. We provide an example input file in examples/metadata.tsv

YOUTUBE_ID\t{"LatestDAFeature": {"Title": TITLE, "Description": DESCRIPTION, "YouTubeCategory": YOUTUBE_CATEGORY, "VideoLength": VIDEO_LENGTH}, "MediaVersionList": [{"Duration": DURATION}]}

Data Curation Pipeline

One-Liner

bash ./run.sh

To enable GPU computation, modify the CUDA_VISIBLE_DEVICES environment variable accordingly. For example, run the above command as export CUDA_VISIBLE_DEVICES=2,3; bash ./run.sh.

Step-by-Step

  1. Filter the videos with metadata.
bash ./metadata_filtering/code/run.sh

The above command will build the data/filtered.tsv file.

  1. Download the actual video files from youtube.
bash ./video_download/code/run.sh

Although we provide a simple download script, we recommend more scalable solutions for downloading large-scale data.

The above command will download the files to data/videos/raw directory.

  1. Segment the videos into 10-second clips.
bash ./clip_segmentation/code/run.sh

The above command will save the segmented clips to data/videos directory.

  1. Extract features from the clips.
bash ./feature_extraction/code/run.sh

The above command will save the extracted features to data/features directory.

This step requires GPU for faster computation.

  1. Perform clustering with the extracted features.
bash ./clustering/code/run.sh

The above command will save the extracted features to data/clusters directory.

This step requires GPU for faster computation.

  1. Select subset with high audio-visual correspondence using the clustering results.
bash ./subset_selection/code/run.sh

The above command will save the selected clip indices to data/datasets directory.

This step requires GPU for faster computation.

The final output should be saved in the data/output.csv file.

Output File Structure

output.csv is structured as follows. We provide an example output file at examples/output.csv.

# SHARD_NAME,FILENAME,YOUTUBE_ID,SEGMENT
shard-000009,qpxektwhzra_292.mp4,qpxektwhzra,"[292.3329999997, 302.3329999997]"

Evaluation

Instructions on downstream evaluation are provided in Evaluation.

Correspondence Retrieval

Instructions on correspondence retrieval experiments are provided in Correspondence Retrieval.

Owner
sangho.lee
sangho.lee
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

Lutz Roeder 21k Jan 06, 2023
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

GCNet for Object Detection By Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu. This repo is a official implementation of "GCNet: Non-local Networ

Jerry Jiarui XU 1.1k Dec 29, 2022
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

This is the codebase for the paper: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs Directory Structur

Peter Hase 19 Aug 21, 2022
EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation.

This repository contains data and code for our EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation. Please contact me at

9 Oct 28, 2022
Feature extraction made simple with torchextractor

torchextractor: PyTorch Intermediate Feature Extraction Introduction Too many times some model definitions get remorselessly copy-pasted just because

Antoine Broyelle 89 Oct 31, 2022
face_recognization (FaceNet) + TFHE (HNP) + hand_face_detection (Mediapipe)

SuperControlSystem Face_Recognization (FaceNet) 面部识别 (FaceNet) Fully Homomorphic Encryption over the Torus (HNP) 环面全同态加密 (TFHE) Hand_Face_Detection (M

liziyu0104 2 Dec 30, 2021
Multi Agent Reinforcement Learning for ROS in 2D Simulation Environments

IROS21 information To test the code and reproduce the experiments, follow the installation steps in Installation.md. Afterwards, follow the steps in E

11 Oct 29, 2022
Unadversarial Examples: Designing Objects for Robust Vision

Unadversarial Examples: Designing Objects for Robust Vision This repository contains the code necessary to replicate the major results of our paper: U

Microsoft 93 Nov 28, 2022
Official repository for "On Generating Transferable Targeted Perturbations" (ICCV 2021)

On Generating Transferable Targeted Perturbations (ICCV'21) Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Fatih Porikli Paper:

Muzammal Naseer 46 Nov 17, 2022
Real-Time Multi-Contact Model Predictive Control via ADMM

Here, you can find the code for the paper 'Real-Time Multi-Contact Model Predictive Control via ADMM'. Code is currently being cleared up and optimize

17 Dec 28, 2022
Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

NVIDIA Research Projects 4.8k Jan 09, 2023
A facial recognition doorbell system using a Raspberry Pi

Facial Recognition Doorbell This project expands on the person-detecting doorbell system to allow it to identify faces, and announce names accordingly

rydercalmdown 22 Apr 15, 2022
Llvlir - Low Level Variable Length Intermediate Representation

Low Level Variable Length Intermediate Representation Low Level Variable Length

Michael Clark 2 Jan 24, 2022
Code for our NeurIPS 2021 paper: Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

GateL0RD This is a lightweight PyTorch implementation of GateL0RD, our RNN presented in "Sparsely Changing Latent States for Prediction and Planning i

Autonomous Learning Group 16 Nov 03, 2022
Resources complimenting the Machine Learning Course led in the Faculty of mathematics and informatics part of Sofia University.

Machine Learning and Data Mining, Summer 2021-2022 How to learn data science and machine learning? Programming. Learn Python. Basic Statistics. Take a

Simeon Hristov 8 Oct 04, 2022
TCPNet - Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition This is an implementation of TCPNet. Introduction For video recognition task, a g

Zilin Gao 21 Dec 08, 2022
DGL-TreeSearch and the Gurobi-MWIS interface

Independent Set Benchmarking Suite This repository contains the code for our maximum independent set benchmarking suite as well as our implementations

Maximilian Böther 19 Nov 22, 2022
DRIFT is a tool for Diachronic Analysis of Scientific Literature.

About DRIFT is a tool for Diachronic Analysis of Scientific Literature. The application offers user-friendly and customizable utilities for two modes:

Rajaswa Patil 108 Dec 12, 2022
Freecodecamp Scientific Computing with Python Certification; Solution for Challenge 2: Time Calculator

Assignment Write a function named add_time that takes in two required parameters and one optional parameter: a start time in the 12-hour clock format

Hellen Namulinda 0 Feb 26, 2022
WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

368 Dec 26, 2022