VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Last update: Dec 04, 2022

Related tags

Overview

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Introduction

Official repository for VLG-Net: Video-Language Graph Matching Networks for Video Grounding. [ArXiv Preprint]

The paper is accepted to the first edition fo the ICCV workshop: AI for Creative Video Editing and Understanding (CVEU).

Installation

Clone the repository and move to folder:

git clone https://github.com/Soldelli/VLG-Net.git
cd VLG-Net

Install environmnet:

conda env create -f environment.yml

If installation fails, please follow the instructions in file doc/environment.md (link).

Data

Download the following resources and extract the content in the appropriate destination folder. See table.

Resource	Download Link	File Size	Destination Folder
StandfordCoreNLP-4.0.0	link	(~0.5GB)	`./datasets/`
TACoS	link	(~0.5GB)	`./datasets/`
ActivityNet-Captions	link	(~29GB)	`./datasets/`
DiDeMo	link	(~13GB)	`./datasets/`
GCNeXt warmup	link	(~0.1GB)	`./datasets/`
Pretrained Models	link	(~0.1GB)	`./models/`

The folder structure should be as follows:

.
├── configs
│
├── datasets
│   ├── activitynet1.3
│   │    ├── annotations
│   │    └── features
│   ├── didemo
│   │    ├── annotations
│   │    └── features
│   ├── tacos
│   │    ├── annotations
│   │    └── features
│   ├── gcnext_warmup
│   └── standford-corenlp-4.0.0
│
├── doc
│
├── lib
│   ├── config
│   ├── data
│   ├── engine
│   ├── modeling
│   ├── structures
│   └── utils
│
├── models
│   ├── activitynet
│   └── tacos
│
├── outputs
│
└── scripts

Training

Copy paste the following commands in the terminal.

Load environment:

conda activate vlg

For ActivityNet-Captions dataset, run:

python train_net.py --config-file configs/activitynet.yml OUTPUT_DIR outputs/activitynet

For TACoS dataset, run:

python train_net.py --config-file configs/tacos.yml OUTPUT_DIR outputs/tacos

Evaluation

For simplicity we provide scripts to automatically run the inference on pretrained models. See script details if you want to run inference on a different model.

Load environment:

conda activate vlg

Then run one of the following scripts to launch the evaluation.

For ActivityNet-Captions dataset, run:

    bash scripts/activitynet.sh

For TACoS dataset, run:

    bash scripts/tacos.sh

Expected results:

After cleaning the code and fixing a couple of minor bugs, performance changed (slightly) with respect to reported numbers in the paper. See below table.

ActivityNet	[email protected]	[email protected]	[email protected]	[email protected]
Paper	46.32	29.82	77.15	63.33
Current	46.32	29.79	77.19	63.36

TACoS	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
Paper	57.21	45.46	34.19	81.80	70.38	56.56
Current	57.16	45.56	34.14	81.48	70.13	56.34

Citation

If any part of our paper and code is helpful to your work, please cite with:

@inproceedings{soldan2021vlg,
  title={VLG-Net: Video-Language Graph Matching Network for Video Grounding},
  author={Soldan, Mattia and Xu, Mengmeng and Qu, Sisi and Tegner, Jesper and Ghanem, Bernard},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={3224--3234},
  year={2021}
}

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Related tags

Overview

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Introduction

Installation

Data

Training

Evaluation

Expected results:

Citation

Owner

Mattia Soldan

Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

This repository contains the DendroMap implementation for scalable and interactive exploration of image datasets in machine learning.

PyTorch implementation for "HyperSPNs: Compact and Expressive Probabilistic Circuits", NeurIPS 2021

Tensorflow/Keras Plug-N-Play Deep Learning Models Compilation

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Automatic Image Background Subtraction

Binary classification for arrythmia detection with ECG datasets.

Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph

This tutorial aims to learn the basics of deep learning by hands, and master the basics through combination of lectures and exercises

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Pytorch implementation of NeurIPS 2021 paper: Geometry Processing with Neural Fields.

torchlm is aims to build a high level pipeline for face landmarks detection, it supports training, evaluating, exporting, inference(Python/C++) and 100+ data augmentations

Image based Human Fall Detection

Code for the submitted paper Surrogate-based cross-correlation for particle image velocimetry

Implementations of the algorithms in the paper Approximative Algorithms for Multi-Marginal Optimal Transport and Free-Support Wasserstein Barycenters

PointCNN: Convolution On X-Transformed Points (NeurIPS 2018)

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

A pytorch implementation of Pytorch-Sketch-RNN

One-line your code easily but still with the fun of doing so!

Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization