Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Last update: Apr 06, 2022

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

========================================================================

Author: Jonathan Kuo
Python: 3.6.1
TensorFlow: 1.0.1 Keras: 2.0.4

Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Introduction

The Keras deep learning architecture of this project was inspired by Deep Visual-Semantic Alignments for Generating Image Descriptions by Andrej Karpathy and Fei-Fei Li.

Given input of a dataset of images and their sentence descriptions, define a Keras (TensorFlow backend) deep learning model that corresponds detected regions on image with description segments. This learning allows the model to output novel descriptions for test images.

Dataset

Microsoft Common Objects in Context (MSCOCO) is an image recognition, segmentation, and captioning dataset. Training data includes 123,000 images and caption pairs. Validation and testing data are both 5,000 images and caption pairs.

Architecture

VGG16 CNN architecture (loaded in Keras) with pre-trained weights on ImageNet are used as the CNN to detect objects in the image. Then, the last dense softmax 200-classification layer was removed in order to pass the 4096-D activations into into the RNN (LSTM). CNN weights are frozen and RNN weights are updated in backpropagation through time (BPTT). The CNN and LSTM is merged before passing into a second LSTM to predict the next word in the sequence. RMSprop is used as the optimizer to combat the vanishing gradient problem.

Demo

View the demo iPython notebook for the model training and prediction on the MSCOCO dataset.

Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Related tags

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

Introduction

Dataset

Architecture

Demo

Owner

Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models

N-Omniglot is a large neuromorphic few-shot learning dataset

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Chainer Implementation of Semantic Segmentation using Adversarial Networks

Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

IAST: Instance Adaptive Self-training for Unsupervised Domain Adaptation (ECCV 2020)

Reporting and Visualization for Hazardous Events

PAMI stands for PAttern MIning. It constitutes several pattern mining algorithms to discover interesting patterns in transactional/temporal/spatiotemporal databases

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

LightningFSL: Pytorch-Lightning implementations of Few-Shot Learning models.

This game was designed to encourage young people not to gamble on lotteries, as the probablity of correctly guessing the number is infinitesimal!

Code for the paper titled "Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks" (NeurIPS 2021 Spotlight).

[UNMAINTAINED] Automated machine learning for analytics & production

Implementation of parameterized soft-exponential activation function.

[ICLR 2022] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

GenshinMapAutoMarkTools - Tools To add/delete/refresh resources mark in Genshin Impact Map