PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Last update: Dec 23, 2022

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)

PyTorch code for M²HSE. The local-level subenetwork of our M²HSE is built on top of the VSESC.

Xinlei Pei, Zheng Liu, Shaojing Yuan, Shanshan Gao, Huijian Han and Caiming Zhang. "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Introduction

We give a demo code of the Corel 5K dataset, including the details of training process for the global-level subnetwork and the local-level subnetwork.

Requirements

We recommended the following dependencies.

Python 3.6
PyTorch (1.3.1)
NumPy (1.19.2)
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data

The raw images and the corrsponding texts can be downloaded from here. Note that we performed data cleaning on this dataset and the specific operations are described in the paper.

Besides, 1) for extracting the fine-grained visual features, the raw images are divided uniformly into 3*3 blocks. 2) we adopt the AlexNet, pre-trained on ImageNet, to extract the CNN features. 3) We upload text data in the ./data/coarse-grained-data/ and ./data/fine-grained-data . Therefore, for data preparation you have the following two options :

Download the above raw data and extract the corresponding features according to the strategy we introduced in the paper.
Contact us for relevant data. (Email: [email protected])

Training models

For training the global-level subnetwork:

Run train_global.py:

python train_global.py 
    --data_path ./data/coarse-grained-data
    --data_name corel5k_precomp 
    --vocab_path ./vocab 
    --logger_name ./checkpoint/M2HSE/Global/Corel5K 
    --model_name ./checkpoint/M2HSE/Global/Corel5K 
    --num_epochs 100 
    --lr_updata 50 
    --batchsize 100  
    --gamma_1 1 
    --gamma_2 .5 
    --alpha_1 .8 
    --alpha_2 .8

For training the local-level subnetwork:

Run train_local.py:

python train_local.py 
    --data_path ./data/fine-grained-data
    --data_name corel5k_precomp 
    --vocab_path ./vocab 
    --logger_name ./checkpoint/M2HSE/Local/Corel5K 
    --model_name ./checkpoint/M2HSE/Local/Corel5K 
    --num_epochs 100 
    --lr_updata 50 
    --batchsize 100  
    --gamma_1 1 
    --gamma_2 .5 
    --beta_1 .4 
    --beta_2 .4

Reference

Stay tuned. :)

License

Apache License 2.0

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Related tags

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)

Introduction

Requirements

Download data

Training models

Reference

License

Owner

Xinlei-Pei

Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Label-Free Model Evaluation with Semi-Structured Dataset Representations

Trustworthy AI related projects

This repository provides the official code for GeNER (an automated dataset Generation framework for NER).

Official code for "Mean Shift for Self-Supervised Learning"

Using deep learning model to detect breast cancer.

Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

Unofficial PyTorch Implementation of Multi-Singer

DaReCzech is a dataset for text relevance ranking in Czech

A different spin on dataclasses.

Wider-Yolo Kütüphanesi ile Yüz Tespit Uygulamanı Yap

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

PyTorch implementation of the wavelet analysis from Torrence & Compo

City Surfaces: City-scale Semantic Segmentation of Sidewalk Surfaces

The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

Read and write layered TIFF ImageSourceData and ImageResources tags

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Enhancing Column Generation by a Machine-Learning-BasedPricing Heuristic for Graph Coloring

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Related tags

Overview

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE)

Introduction

Requirements

Download data

Training models

Reference

License

Owner

Xinlei-Pei

Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Label-Free Model Evaluation with Semi-Structured Dataset Representations

Trustworthy AI related projects

This repository provides the official code for GeNER (an automated dataset Generation framework for NER).

Official code for "Mean Shift for Self-Supervised Learning"

Using deep learning model to detect breast cancer.

Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

Unofficial PyTorch Implementation of Multi-Singer

DaReCzech is a dataset for text relevance ranking in Czech

A different spin on dataclasses.

Wider-Yolo Kütüphanesi ile Yüz Tespit Uygulamanı Yap

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

PyTorch implementation of the wavelet analysis from Torrence & Compo

City Surfaces: City-scale Semantic Segmentation of Sidewalk Surfaces

The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

Read and write layered TIFF ImageSourceData and ImageResources tags

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Enhancing Column Generation by a Machine-Learning-BasedPricing Heuristic for Graph Coloring

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M²HSE)