Code and Resources for the Transformer Encoder Reasoning Network (TERN)

Last update: Dec 30, 2022

Related tags

Overview

Transformer Encoder Reasoning Network

Code for the cross-modal visual-linguistic retrieval method from "Transformer Reasoning Network for Image-Text Matching and Retrieval", accepted to ICPR 2020 [Pre-print PDF].

This repo is built on top of VSE++.

Setup

Clone the repo and move into it:

git clone https://github.com/mesnico/TERN
cd TERN

Setup python environment using conda:

conda env create --file environment.yml
conda activate tern
export PYTHONPATH=.

Get the data

Download and extract the data folder, containing COCO annotations, the splits by Karpathy et al. and ROUGEL - SPICE precomputed relevances:

wget http://datino.isti.cnr.it/tern/data.tar
tar -xvf data.tar

Download the bottom-up features. We rearranged the ones provided by Anderson et al. in multiple .npy files, one for every image in the COCO dataset. This is beneficial during the dataloading phase. The following command extracts them under data/coco/. If you prefer another location, be sure to adjust the configuration file accordingly.

wget http://datino.isti.cnr.it/tern/features_36.tar
tar -xvf features_36.tar -C data/coco

Evaluate

Download our pre-trained TERN model:

wget http://datino.isti.cnr.it/tern/model_best_ndcg.pth

Then, issue the following commands for evaluating the model on the 1k (5fold cross-validation) or 5k test sets.

python3 test.py model_best_ndcg.pth --config configs/tern.yaml --size 1k
python3 test.py model_best_ndcg.pth --config configs/tern.yaml --size 5k

Train

In order to train the model using the basic TERN configuration, issue the following command:

python3 train.py --config configs/tern.yaml --logger_name runs/tern

runs/tern is where the output files (tensorboard logs, checkpoints) will be stored during this training session.

Reference

If you found this code useful, please cite the following paper:

@article{messina2020transformer,
  title={Transformer Reasoning Network for Image-Text Matching and Retrieval},
  author={Messina, Nicola and Falchi, Fabrizio and Esuli, Andrea and Amato, Giuseppe},
  journal={arXiv preprint arXiv:2004.09144},
  year={2020}
}

License

Apache License 2.0

Code and Resources for the Transformer Encoder Reasoning Network (TERN)

Related tags

Overview

Transformer Encoder Reasoning Network

Setup

Get the data

Evaluate

Train

Reference

License

Owner

Nicola Messina

Optimized primitives for collective multi-GPU communication

Generative Modelling of BRDF Textures from Flash Images [SIGGRAPH Asia, 2021]

Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking

Alternatives to Deep Neural Networks for Function Approximations in Finance

Simulation-based inference for the Galactic Center Excess

End-to-end machine learning project for rices detection

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

Flexible-CLmser: Regularized Feedback Connections for Biomedical Image Segmentation

ADB-IP-ROTATION - Use your mobile phone to gain a temporary IP address using ADB and data tethering

Unsupervised Attributed Multiplex Network Embedding (AAAI 2020)

Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Modeling CNN layers activity with Gaussian mixture model

Fewshot-face-translation-GAN - Generative adversarial networks integrating modules from FUNIT and SPADE for face-swapping.

Auto HMM: Automatic Discrete and Continous HMM including Model selection

Out-of-boundary View Synthesis towards Full-frame Video Stabilization

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

A curated list of awesome resources combining Transformers with Neural Architecture Search