[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Overview

Disentangled Representation Learning for Text-Video Retrieval

MSR-VTT DiDeMo

This is a PyTorch implementation of the paper Disentangled Representation Learning for Text-Video Retrieval:

@Article{DRLTVR2022,
  author  = {Qiang Wang and Yanhao Zhang and Yun Zheng and Pan Pan and Xian-Sheng Hua},
  journal = {arXiv:2203.07111},
  title   = {Disentangled Representation Learning for Text-Video Retrieval},
  year    = {2022},
}

Catalog

  • Setup
  • Fine-tuning code
  • Visualization demo

Setup

Setup code environment

git clone https://github.com/foolwood/DRL.git
cd DRL
conda create -n drl python=3.9
conda activate drl
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install torch==1.8.1+cu102 torchvision==0.9.1+cu102 -f https://download.pytorch.org/whl/torch_stable.html

Download CLIP Model (as pretraining)

cd tvr/models
wget https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt
# wget https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt
# wget https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt

Download Datasets

cd data/MSR-VTT
wget https://www.robots.ox.ac.uk/~maxbain/frozen-in-time/data/MSRVTT.zip ; unzip MSRVTT.zip
mv MSRVTT/videos/all ./videos ; mv MSRVTT/annotation/MSR_VTT.json ./anns/MSRVTT_data.json

Fine-tuning code

  • Train on MSR-VTT 1k.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 \
main.py --do_train 1 --workers 8 --n_display 50 \
--epochs 5 --lr 1e-4 --coef_lr 1e-3 --batch_size 128 --batch_size_val 128 \
--anno_path data/MSR-VTT/anns --video_path data/MSR-VTT/videos --datatype msrvtt \
--max_words 32 --max_frames 12 --video_framerate 1 \
--base_encoder ViT-B/32 --agg_module seqTransf \
--interaction wti --wti_arch 2 --cdcr 3 --cdcr_alpha1 0.11 --cdcr_alpha2 0.0 --cdcr_lambda 0.001 \
--output_dir ckpts/ckpt_msrvtt_wti_cdcr

Reproduce the ablation experiments scripts

configs
feature gpus Text-Video Video-Text train time (h)
[email protected] [email protected] [email protected] MdR MnR [email protected] [email protected] [email protected] MdR MnR
CLIP4Clip ViT/B-32 4 42.8 72.1 81.4 2.0 16.3 44.1 70.5 80.5 2.0 11.8 10.5
zero-shot ViT/B-32 4 31.1 53.7 63.4 4.0 41.6 26.5 50.1 61.7 5.0 39.9 -
Interaction
DP+None ViT/B-32 4 42.9 70.6 81.4 2.0 15.4 43.0 71.1 81.1 2.0 11.8 2.5
DP+seqTransf ViT/B-32 4 42.8 71.1 81.1 2.0 15.6 44.1 70.9 80.9 2.0 11.7 2.6
XTI+None ViT/B-32 4 40.5 71.1 82.6 2.0 13.6 42.7 70.8 80.2 2.0 12.5 14.3
XTI+seqTransf ViT/B-32 4 42.4 71.3 80.9 2.0 15.2 40.1 69.2 79.6 2.0 15.8 16.8
TI+seqTransf ViT/B-32 4 44.8 73.0 82.2 2.0 13.4 42.6 72.7 82.8 2.0 9.1 2.6
WTI+seqTransf ViT/B-32 4 46.6 73.4 83.5 2.0 13.0 45.4 73.4 81.9 2.0 9.2 2.6
Channel DeCorrelation Regularization
DP+seqTransf+CDCR ViT/B-32 4 43.9 71.1 81.2 2.0 15.3 42.3 70.3 81.1 2.0 11.4 2.6
TI+seqTransf+CDCR ViT/B-32 4 45.8 73.0 81.9 2.0 12.8 43.3 71.8 82.7 2.0 8.9 2.6
WTI+seqTransf+CDCR ViT/B-32 4 47.6 73.4 83.3 2.0 12.8 45.1 72.9 83.5 2.0 9.2 2.6

Note: the performances are slight boosts due to new hyperparameters.

Visualization demo

Run our visualization demo using matplotlib (no GPU needed):

License

See LICENSE for details.

Acknowledgments

Our code is partly based on CLIP4Clip.

Owner
Qiang Wang
Computer Vision & Machine Learning
Qiang Wang
ReferFormer - Official Implementation of ReferFormer

The official implementation of the paper: Language as Queries for Referring Video Object Segmentation Language as Queries for Referring Video Object S

Jonas Wu 232 Dec 29, 2022
We are More than Our JOints: Predicting How 3D Bodies Move

We are More than Our JOints: Predicting How 3D Bodies Move Citation This repo contains the official implementation of our paper MOJO: @inproceedings{Z

72 Oct 20, 2022
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation This repository contains the source code of our paper, ESPNet (acc

Sachin Mehta 515 Dec 13, 2022
Framework for evaluating ANNS algorithms on billion scale datasets.

Billion-Scale ANN http://big-ann-benchmarks.com/ Install The only prerequisite is Python (tested with 3.6) and Docker. Works with newer versions of Py

Harsha Vardhan Simhadri 132 Dec 24, 2022
MvtecAD unsupervised Anomaly Detection

MvtecAD unsupervised Anomaly Detection This respository is the unofficial implementations of DFR: Deep Feature Reconstruction for Unsupervised Anomaly

0 Feb 25, 2022
AlphaNet Improved Training of Supernet with Alpha-Divergence

AlphaNet: Improved Training of Supernet with Alpha-Divergence This repository contains our PyTorch training code, evaluation code and pretrained model

Facebook Research 87 Oct 10, 2022
BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer

BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer Project Page | Paper | Video State-of-the-art image-to-image translatio

47 Dec 06, 2022
Painting app using Python machine learning and vision technology.

AI Painting App We are making an app that will track our hand and helps us to draw from that. We will be using the advance knowledge of Machine Learni

Badsha Laskar 3 Oct 03, 2022
Example how to deploy deep learning model with aiohttp.

aiohttp-demos Demos for aiohttp project. Contents Imagetagger Deep Learning Image Classifier URL shortener Toxic Comments Classifier Moderator Slack B

aio-libs 661 Jan 04, 2023
Voila - Voilà turns Jupyter notebooks into standalone web applications

Rendering of live Jupyter notebooks with interactive widgets. Introduction Voilà turns Jupyter notebooks into standalone web applications. Unlike the

Voilà Dashboards 4.5k Jan 03, 2023
The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection (ACM MM'21) By Zhuofan Zong, Qianggang Cao, Biao Leng Introduction F

TempleX 9 Jul 30, 2022
Read and write layered TIFF ImageSourceData and ImageResources tags

Read and write layered TIFF ImageSourceData and ImageResources tags Psdtags is a Python library to read and write the Adobe Photoshop(r) specific Imag

Christoph Gohlke 4 Feb 05, 2022
Repository for training material for the 2022 SDSC HPC/CI User Training Course

hpc-training-2022 Repository for training material for the 2022 SDSC HPC/CI Training Series HPC/CI Training Series home https://www.sdsc.edu/event_ite

sdsc-hpc-training-org 21 Jul 27, 2022
The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

SGRAF PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”. It is built on top of the SCAN and C

Ronnie_IIAU 149 Dec 22, 2022
An educational tool to introduce AI planning concepts using mobile manipulator robots.

JEDAI Explains Decision-Making AI Virtual Machine Image The recommended way of using JEDAI is to use pre-configured Virtual Machine image that is avai

Autonomous Agents and Intelligent Robots 13 Nov 15, 2022
Pytorch implementation of the paper "Class-Balanced Loss Based on Effective Number of Samples"

Class-balanced-loss-pytorch Pytorch implementation of the paper Class-Balanced Loss Based on Effective Number of Samples presented at CVPR'19. Yin Cui

Vandit Jain 697 Dec 29, 2022
Image Fusion Transformer

Image-Fusion-Transformer Platform Python 3.7 Pytorch =1.0 Training Dataset MS-COCO 2014 (T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ram

Vibashan VS 68 Dec 23, 2022
LSUN Dataset Documentation and Demo Code

LSUN Please check LSUN webpage for more information about the dataset. Data Release All the images in one category are stored in one lmdb database fil

Fisher Yu 426 Jan 02, 2023
This repo provides the base code for pytorch-lightning and weight and biases simultaneous integration.

Write your model faster with pytorch-lightning-wadb-code-backbone This repository provides the base code for pytorch-lightning and weight and biases s

9 Mar 29, 2022
Rainbow: Combining Improvements in Deep Reinforcement Learning

Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning [1]. Results and pretrained models can be found in the releases. DQN [2] Double

Kai Arulkumaran 1.4k Dec 29, 2022