The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Last update: Nov 14, 2022

Related tags

Deep Learning weak-sup-visual-grounding

Overview

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

This repository is the official implementation of CVPR 2021 paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Requirements

Tensorflow-1-15

Training

To train the NCE model(s) in the paper, run this command:

python train_nce_distill_model.py \
  --region_feat_path=region_features.hdf5 \
  --phrase_feat_path=phrase_features.hdf5 \
  --glove_path=glove.hdf5

To train the NCE+Distill model(s) in the paper, run this command:

python train_nce_distill_model.py \
  --region_feat_path=region_features.hdf5 \
  --phrase_feat_path=phrase_features.hdf5 \
  --glove_path=glove.hdf5 \
  --phrase_to_label_json=phrase_to_label.json

Evaluation

To evaluate the model on Flickr30K, run:

python eval_model.py \
  --region_feat_path=region_features_test.hdf5 \
  --phrase_feat_path=phrase_features_test.hdf5 \
  --glove_path=glove.hdf5 \
  --restore_path=checkpoint.meta

Pre-trained Models

You can download pretrained models using Res101 VG features here:

You can also find the features on Flickr30K test split here.

The pretrained models achieve the following performance on Flickr30K test split:

Model Name	[email protected]	[email protected]	[email protected]
NCE+Distill	0.5310	0.7394	0.7875
NCE	0.5135	0.7338	0.7833

Citation

If you use our implementation in your research or wish to refer to the results published in our paper, please use the following BibTeX entry.

@InProceedings{Wang_2021_CVPR,
    author    = {Wang, Liwei and Huang, Jing and Li, Yin and Xu, Kun and Yang, Zhengyuan and Yu, Dong},
    title     = {Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {14090-14100}
}

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Related tags

Overview

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Requirements

Training

Evaluation

Pre-trained Models

Citation

Owner

Sharing of contents on mitochondrial encounter networks

Back to Event Basics: SSL of Image Reconstruction for Event Cameras

Model of an AI powered sign language interpreter.

GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs

A Simulated Optimal Intrusion Response Game

The official homepage of the COCO-Stuff dataset.

RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems

RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Marks Lab at Harvard.

Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.

Modeling CNN layers activity with Gaussian mixture model

Recovering Brain Structure Network Using Functional Connectivity

Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

Code and experiments for "Deep Neural Networks for Rank Consistent Ordinal Regression based on Conditional Probabilities"

Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

[v1 (ISBI'21) + v2] MedMNIST: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification

Fast Style Transfer in TensorFlow

Learning with Subset Stacking