TensorFlow Implementation of "Show, Attend and Tell"

Last update: Nov 29, 2022

Overview

Show, Attend and Tell

Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attention which introduces an attention based image caption generator. The model changes its attention to the relevant part of the image while it generates each word.

References

Author's theano code: https://github.com/kelvinxu/arctic-captions

Another tensorflow implementation: https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow

Getting Started

Prerequisites

First, clone this repo and pycocoevalcap in same directory.

$ git clone https://github.com/yunjey/show-attend-and-tell-tensorflow.git
$ git clone https://github.com/tylin/coco-caption.git

This code is written in Python2.7 and requires TensorFlow 1.2. In addition, you need to install a few more packages to process MSCOCO data set. I have provided a script to download the MSCOCO image dataset and VGGNet19 model. Downloading the data may take several hours depending on the network speed. Run commands below then the images will be downloaded in image/ directory and VGGNet19 model will be downloaded in data/ directory.

$ cd show-attend-and-tell-tensorflow
$ pip install -r requirements.txt
$ chmod +x ./download.sh
$ ./download.sh

For feeding the image to the VGGNet, you should resize the MSCOCO image dataset to the fixed size of 224x224. Run command below then resized images will be stored in image/train2014_resized/ and image/val2014_resized/ directory.

$ python resize.py

Before training the model, you have to preprocess the MSCOCO caption dataset. To generate caption dataset and image feature vectors, run command below.

$ python prepro.py

Train the model

To train the image captioning model, run command below.

$ python train.py

(optional) Tensorboard visualization

I have provided a tensorboard visualization for real-time debugging. Open the new terminal, run command below and open http://localhost:6005/ into your web browser.

$ tensorboard --logdir='./log' --port=6005

Evaluate the model

To generate captions, visualize attention weights and evaluate the model, please see evaluate_model.ipynb.

TensorFlow Implementation of "Show, Attend and Tell"

Related tags

Overview

Show, Attend and Tell

References

Getting Started

Prerequisites

Train the model

(optional) Tensorboard visualization

Evaluate the model

Results

Training data

(1) Generated caption: A plane flying in the sky with a landing gear down.

(2) Generated caption: A giraffe and two zebra standing in the field.

Validation data

(1) Generated caption: A large elephant standing in a dry grass field.

(2) Generated caption: A baby elephant standing on top of a dirt field.

Test data

(1) Generated caption: A plane flying over a body of water.

(2) Generated caption: A zebra standing in the grass near a tree.

Owner

Yunjey Choi

Answering Open-Domain Questions of Varying Reasoning Steps from Text

Data reduction pipeline for KOALA on the AAT.

【CVPR 2021, Variational Inference Framework, PyTorch】 From Rain Generation to Rain Removal

pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Bayesian Meta-Learning Through Variational Gaussian Processes

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

Classifying audio using Wavelet transform and deep learning

🧠 A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation.', ECCV 2016

4K videos with annotated masks in our ICCV2021 paper 'Internal Video Inpainting by Implicit Long-range Propagation'.

Autotype on websites that have copy-paste disabled like Moodle, HackerEarth contest etc.

SwinTrack: A Simple and Strong Baseline for Transformer Tracking

Pyramid Scene Parsing Network, CVPR2017.

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

PyTorch implementation of DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images

v objective diffusion inference code for PyTorch.

The CLRS Algorithmic Reasoning Benchmark

PAIRED in PyTorch 🔥

The official PyTorch code implementation of "Human Trajectory Prediction via Counterfactual Analysis" in ICCV 2021.

Official code repository for the EMNLP 2021 paper