Near-Duplicate Video Retrieval with Deep Metric Learning

Last update: Jan 24, 2022

Related tags

Overview

Near-Duplicate Video Retrieval
with Deep Metric Learning

This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retrieval with Deep Metric Learning. It provides code for training and evalutation of a Deep Metric Learning (DML) network on the problem of Near-Duplicate Video Retrieval (NDVR). During training, the DML network is fed with video triplets, generated by a triplet generator. The network is trained based on the triplet loss function. The architecture of the network is displayed in the figure below. For evaluation, mean Average Precision (mAP) and Presicion-Recall curve (PR-curve) are calculated. Two publicly available dataset are supported, namely VCDB and CC_WEB_VIDEO.

Prerequisites

Python
Tensorflow 1.xx

Getting started

Installation

Clone this repo:

git clone https://github.com/MKLab-ITI/ndvr-dml
cd ndvr-dml

You can install all the dependencies by

pip install -r requirements.txt

conda install --file requirements.txt

Triplet generation

Run the triplet generation process for each dataset, VCDB and CC_WEB_VIDEO. This process will generate two files for each dataset:

the global feature vectors for each video in the dataset:
<output_dir>/<dataset>_features.npy
the generated triplets:
<output_dir>/<dataset>_triplets.npy

To execute the triplet generation process, do as follows:

The code does not extract features from videos. Instead, the .npy files of the already extracted features have to be provided. You may use the tool in here to do so.

Create a file that contains the video id and the path of the feature file for each video in the processing dataset. Each line of the file have to contain the video id (basename of the video file) and the full path to the corresponding .npy file of its features, separated by a tab character (\t). Example:

  23254771545e5d278548ba02d25d32add952b2a4	features/23254771545e5d278548ba02d25d32add952b2a4.npy
  468410600142c136d707b4cbc3ff0703c112575d	features/468410600142c136d707b4cbc3ff0703c112575d.npy
  67f1feff7f624cf0b9ac2ebaf49f547a922b4971	features/67f1feff7f624cf0b9ac2ebaf49f547a922b4971.npy
                                           ...

Run the triplet generator and provide the generated file from the previous step, the name of the processed dataset, and the output directory.

python triplet_generator.py --dataset vcdb --feature_files vcdb_feature_files.txt --output_dir output_data/

The global video features extracted based on the Intermediate CNN Features, and their generated triplets for both datasets can be found here.

DML training

Train the DML network by providing the global features and triplet of VCDB, and a directory to save the trained model.

python train_dml.py --train_set output_data/vcdb_features.npy --triplets output_data/vcdb_triplets.npy --model_path model/

Triplets from the CC_WEB_VIDEO can be injected if the global features and triplet of the evaluation set are provide.

python train_dml.py --evaluation_set output_data/cc_web_video_features.npy --evaluation_triplets output_data/cc_web_video_triplets.npy --train_set output_data/vcdb_features.npy --triplets output_data/vcdb_triplets.npy --model_path model/

Evaluation

Evaluate the performance of the system by providing the trained model path and the global features of the CC_WEB_VIDEO.

python evaluation.py --fusion Early --evaluation_set output_data/cc_vgg_features.npy --model_path model/

python evaluation.py --fusion Late --evaluation_features cc_web_video_feature_files.txt --evaluation_set output_data/cc_vgg_features.npy --model_path model/

The mAP and PR-curve are returned

Citation

If you use this code for your research, please cite our paper.

@inproceedings{kordopatis2017dml,
  title={Near-Duplicate Video Retrieval with Deep Metric Learning},
  author={Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Patras, Ioannis and Kompatsiaris, Yiannis},
  booktitle={2017 IEEE International Conference on Computer Vision Workshop (ICCVW)},
  year={2017},
}

Related Projects

ViSiL Intermediate-CNN-Features FIVR-200K

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details

Contact for further details about the project

Giorgos Kordopatis-Zilos ([email protected])
Symeon Papadopoulos ([email protected])

Near-Duplicate Video Retrieval with Deep Metric Learning

Related tags

Overview

Near-Duplicate Video Retrieval
with Deep Metric Learning

Prerequisites

Getting started

Installation

Triplet generation

DML training

Evaluation

Citation

Related Projects

License

Contact for further details about the project

Owner

C3DPO - Canonical 3D Pose Networks for Non-rigid Structure From Motion.

Official PyTorch implementation of "Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient".

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

For storing the complete exploration of Visual Question Answering for our B.Tech Project

Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning"

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Identifying a Training-Set Attack’s Target Using Renormalized Influence Estimation

Official implementation for "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation" (CVPR 2022)

PyTorch implementations of the beta divergence loss.

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

CUDA Python Low-level Bindings

PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

DvD-TD3: Diversity via Determinants for TD3 version

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

Python Library for Signal/Image Data Analysis with Transport Methods

A simple code to perform canny edge contrast detection on images.

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)

a simple, efficient, and intuitive text editor

An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

An implementation of the BADGE batch active learning algorithm.

Near-Duplicate Video Retrieval with Deep Metric Learning

Related tags

Overview

Near-Duplicate Video Retrieval with Deep Metric Learning

Prerequisites

Getting started

Installation

Triplet generation

DML training

Evaluation

Citation

Related Projects

License

Contact for further details about the project

Owner

C3DPO - Canonical 3D Pose Networks for Non-rigid Structure From Motion.

Official PyTorch implementation of "Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient".

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

For storing the complete exploration of Visual Question Answering for our B.Tech Project

Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning"

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Identifying a Training-Set Attack’s Target Using Renormalized Influence Estimation

Official implementation for "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation" (CVPR 2022)

PyTorch implementations of the beta divergence loss.

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

CUDA Python Low-level Bindings

PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

DvD-TD3: Diversity via Determinants for TD3 version

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

Python Library for Signal/Image Data Analysis with Transport Methods

A simple code to perform canny edge contrast detection on images.

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)

a simple, efficient, and intuitive text editor

An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

An implementation of the BADGE batch active learning algorithm.

Near-Duplicate Video Retrieval
with Deep Metric Learning

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.