Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Last update: Dec 23, 2022

Related tags

Deep Learning PanoAVQA

Overview

Pano-AVQA

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

[Paper] [Poster] [Video]

Getting Started

This code is based on following libraries:

python=3.8
pytorch=1.7.0 (with cuda 10.2)

To create virtual environment with all necessary libraries:

conda env create -f environment.yml

By default data should be saved under data/feat/{audio,label,visual} directory and logs (w/ cache, checkpoint) are saved under data/{cache,ckpt,log} directory. Using symbolic link is recommended:

ln -s {path_to_your_data_directory} data

We use single TITAN RTX for training, but GPUs with less memory are still doable with smaller batch size (provided precomputed features).

Dataset

We plan to release the Pano-AVQA dataset public within this year, including Q&A annotation, precomputed features, etc. Please stay tuned!

Model

Training

Default configuration is provided in code/config.py. To run with this configuration:

python cli.py

To run with custom configuration, either modify code/config.py or execute:

python cli.py with {{flags_at_your_disposal}}

Inference

Model weight is saved under ./data/log directory. To run inference only:

python cli.py eval with ckpt_file=../data/log/{experiment}/{ckpt}.pth

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Yun2021PanoAVQA,
    author = {Yun, Heeseung and Yu, Youngjae and Yang, Wonsuk and Lee, Kangil and Kim, Gunhee},
    title = {Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos},
    booktitle = {ICCV},
    year = {2021}
}

Contact

If you have any inquiries, please don't hesitate to contact us via heeseung.yun at vision.snu.ac.kr.

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Related tags

Overview

Pano-AVQA

[Paper] [Poster] [Video]

Getting Started

Dataset

Model

Training

Inference

Citation

Contact

Owner

Heeseung Yun

A synthetic texture-invariant dataset for object detection of UAVs

Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations)

这是一个yolo3-tf2的源码，可以用于训练自己的模型。

Code for the paper titled "Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks" (NeurIPS 2021 Spotlight).

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance

Framework for evaluating ANNS algorithms on billion scale datasets.

Computer Vision and Pattern Recognition, NUS CS4243, 2022

JUSTICE: A Benchmark Dataset for Supreme Court’s Judgment Prediction

PyTorch implementation of HDN(Homography Decomposition Networks) for planar object tracking

It's a implement of this paper：Relation extraction via Multi-Level attention CNNs

[2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval

This project hosts the code for implementing the ISAL algorithm for object detection and image classification

python library for invisible image watermark (blind image watermark)

Official implementation of NeurIPS 2021 paper "Contextual Similarity Aggregation with Self-attention for Visual Re-ranking"

This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

CT-Net: Channel Tensorization Network for Video Classification

POT : Python Optimal Transport

Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

本步态识别系统主要基于GaitSet模型进行实现