PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Last update: Dec 05, 2022

Overview

Improving Visual-Semantic Embeddings with Hard Negatives

Code for the image-caption retrieval methods from VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , F. Faghri, D. J. Fleet, J. R. Kiros, S. Fidler, Proceedings of the British Machine Vision Conference (BMVC), 2018. (BMVC Spotlight)

Dependencies

We recommended to use Anaconda for the following packages.

Python 2.7 (Checkout branch python3)
PyTorch (>0.2) (Checkout branch pytorch4.1)
NumPy (>1.12.1)
TensorBoard
pycocotools
torchvision
matplotlib
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data

Download the dataset files and pre-trained models. We use splits produced by Andrej Karpathy. The precomputed image features are from here and here. To use full image encoders, download the images from their original sources here, here and here.

wget http://www.cs.toronto.edu/~faghri/vsepp/vocab.tar
wget http://www.cs.toronto.edu/~faghri/vsepp/data.tar
wget http://www.cs.toronto.edu/~faghri/vsepp/runs.tar

We refer to the path of extracted files for data.tar as $DATA_PATH and files for models.tar as $RUN_PATH. Extract vocab.tar to ./vocab directory.

Update: The vocabulary was originally built using all sets (including test set captions). Please see issue #29 for details. Please consider not using test set captions if building up on this project.

Evaluate pre-trained models

python -c "\
from vocab import Vocabulary
import evaluation
evaluation.evalrank('$RUN_PATH/coco_vse++/model_best.pth.tar', data_path='$DATA_PATH', split='test')"

To do cross-validation on MSCOCO, pass fold5=True with a model trained using --data_name coco.

Training new models

Run train.py:

python train.py --data_path "$DATA_PATH" --data_name coco_precomp --logger_name 
runs/coco_vse++ --max_violation

Arguments used to train pre-trained models:

Method	Arguments
VSE0	`--no_imgnorm`
VSE++	`--max_violation`
Order0	`--measure order --use_abs --margin .05 --learning_rate .001`
Order++	`--measure order --max_violation`

Reference

If you found this code useful, please cite the following paper:

@article{faghri2018vse++,
  title={VSE++: Improving Visual-Semantic Embeddings with Hard Negatives},
  author={Faghri, Fartash and Fleet, David J and Kiros, Jamie Ryan and Fidler, Sanja},
  booktitle = {Proceedings of the British Machine Vision Conference ({BMVC})},
  url = {https://github.com/fartashf/vsepp},
  year={2018}
}

License

Apache License 2.0

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Related tags

Overview

Improving Visual-Semantic Embeddings with Hard Negatives

Dependencies

Download data

Evaluate pre-trained models

Training new models

Reference

License

Owner

Fartash Faghri

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

AITUS - An atomatic notr maker for CYTUS

Identify the emotion of multiple speakers in an Audio Segment

Trading environnement for RL agents, backtesting and training.

The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

SMPL-X: A new joint 3D model of the human body, face and hands together

The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.

GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)

ML for NLP and Computer Vision.

Node-level Graph Regression with Deep Gaussian Process Models

Generalized Random Forests

Editing a classifier by rewriting its prediction rules

A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection, AAAI 2021.

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Deep learned, hardware-accelerated 3D object pose estimation

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

salabim - discrete event simulation in Python