Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Last update: Dec 17, 2022

Overview

Introduction

This repository is for X-Linear Attention Networks for Image Captioning (CVPR 2020). The original paper can be found here.

Please cite with the following BibTeX:

@inproceedings{xlinear2020cvpr,
  title={X-Linear Attention Networks for Image Captioning},
  author={Pan, Yingwei and Yao, Ting and Li, Yehao and Mei, Tao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Requirements

Python 3
CUDA 10
numpy
tqdm
easydict
PyTorch (>1.0)
torchvision
coco-caption

Data preparation

Download the bottom up features and convert them to npz files

python2 tools/create_feats.py --infeats bottom_up_tsv --outfolder ./mscoco/feature/up_down_10_100

Download the annotations into the mscoco folder. More details about data preparation can be referred to self-critical.pytorch
Download coco-caption and setup the path of __C.INFERENCE.COCO_PATH in lib/config.py
The pretrained models and results can be downloaded here.
The pretrained SENet-154 model can be downloaded here.

Training

Train X-LAN model

bash experiments/xlan/train.sh

Train X-LAN model using self critical

Copy the pretrained model into experiments/xlan_rl/snapshot and run the script

bash experiments/xlan_rl/train.sh

Train X-LAN transformer model

bash experiments/xtransformer/train.sh

Train X-LAN transformer model using self critical

Copy the pretrained model into experiments/xtransformer_rl/snapshot and run the script

bash experiments/xtransformer_rl/train.sh

Evaluation

CUDA_VISIBLE_DEVICES=0 python3 main_test.py --folder experiments/model_folder --resume model_epoch

Acknowledgements

Thanks the contribution of self-critical.pytorch and awesome PyTorch team.

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Related tags

Overview

Introduction

Requirements

Data preparation

Training

Train X-LAN model

Train X-LAN model using self critical

Train X-LAN transformer model

Train X-LAN transformer model using self critical

Evaluation

Acknowledgements

Owner

JDAI-CV

Offline Reinforcement Learning with Implicit Q-Learning

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Build Graph Nets in Tensorflow

Code release for Local Light Field Fusion at SIGGRAPH 2019

History Aware Multimodal Transformer for Vision-and-Language Navigation

This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

The official repository for Deep Image Matting with Flexible Guidance Input

AutoVideo: An Automated Video Action Recognition System

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

Libraries, tools and tasks created and used at DeepMind Robotics.

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

Dogs classification with Deep Metric Learning using some popular losses

Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"

IGCN : Image-to-graph convolutional network

Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

This is official implementaion of paper "Token Shift Transformer for Video Classification".

Twin-deep neural network for semi-supervised learning of materials properties

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.