Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Last update: Nov 07, 2022

Related tags

Overview

NSGDC

Some codes in this repo are copied/modified from opensource implementations made available by UNITER, PyTorch, HuggingFace, OpenNMT, and Nvidia. The image features are extracted using BUTD.

Requirements

This is following UNITER. We provide Docker image for easier reproduction. Please install the following:

Our scripts require the user to have the docker group membership so that docker commands can be run without sudo. We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards. We use mixed-precision training hence GPUs with Tensor Cores are recommended.

Image-Text Retrieval

Download Data

bash scripts/download_itm.sh $PATH_TO_STORAGE

Launch the Docker Container

# docker image should be automatically pulled
source launch_container.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/img_db \
$PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained

In case you would like to reproduce the whole preprocessing pipeline.

The launch script respects $CUDA_VISIBLE_DEVICES environment variable. Note that the source code is mounted into the container under /src instead of built into the image so that user modification will be reflected without re-building the image. (Data folders are mounted into the container separately for flexibility on folder structures.)

Image-Text Retrieval (Flickr30k)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_flickr.sh
bash run_cmds/tran_pnsgd2_base_flickr.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_flickr.sh
bash run_cmds/tran_pnsgd2_large_flickr.sh

Image-Text Retrieval (COCO)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_coco.sh
bash run_cmds/tran_pnsgd2_base_coco.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_coco.sh
bash run_cmds/tran_pnsgd2_large_coco.sh

Run Inference

bash run_cmds/inf_nsgd.sh

Results

Our models achieve the following performance.

MS-COCO

Model	Image-to-Text			Text-to-Image
Model	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
NSGDC-Base	66.6	88.6	94.0	51.6	79.1	87.5
NSGDC-Large	67.8	89.6	94.2	53.3	80.0	88.0

Flickr30K

Model	Image-to-Text			Text-to-Image
Model	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
NSGDC-Base	87.9	98.1	99.3	74.5	93.3	96.3
NSGDC-Large	90.6	98.8	99.1	77.3	94.3	97.3

Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Related tags

Overview

NSGDC

Requirements

Image-Text Retrieval

Download Data

Launch the Docker Container

Image-Text Retrieval (Flickr30k)

Image-Text Retrieval (COCO)

Run Inference

Results

MS-COCO

Flickr30K

Owner

Zhihao Fan

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

BlueFog Tutorials

This program presents convolutional kernel density estimation, a method used to detect intercritical epilpetic spikes (IEDs)

A collection of SOTA Image Classification Models in PyTorch

UCSD Oasis platform

FridaHookAppTool - Frida Hook App Tool With Python

Code for the paper "Can Active Learning Preemptively Mitigate Fairness Issues?" presented at RAI 2021.

Sample code from the Neural Networks from Scratch book.

A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset.

DANet for Tabular data classification/ regression.

🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

Implementation of RegretNet with Pytorch

Evaluating different engineering tricks that make RL work

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

A copy of Ares that costs 30 fucking dollars.

A blender add-on that automatically re-aligns wrong axis objects.

The code for our NeurIPS 2021 paper "Kernelized Heterogeneous Risk Minimization".

A toolkit for controlling Euro Truck Simulator 2 with python to develop self-driving algorithms.

Style transfer, deep learning, feature transform

Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge