Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Last update: Jan 23, 2022

Related tags

Deep Learning Video-Captioning

Overview

Video-Captioning

A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video.

Approach

In our framework we use a sequence-to-sequence model to perform video visual relationship predictions where the input is a sequence of video frames and the output is a relation triplet < object1 − relationship − object2 > representing the videos. We extend the sequence-to-sequence modelling approach to an input of sequence of video frames.

Figure: Bidirectional LSTM layer (coloured red) encodes visual feature inputs, and the LSTM layer (coloured green) decodes the features into a sequence of words.

Results

Python Dependencies

Pandas
Keras
Tensorflow
Numpy
albumenations
Pillow

Procedure

Training

For training the model, run the script train.py.

  python train.py

For training on your own dataset: Save your data in a directory (for the format check the data folder). Update the json files.

object1_object2.json: It contains a dictionary for each object, with object labels as keys and ids as values.
relationship.json: It contains a dictionary for each relationship, with relationship labels as keys and ids as values.
training_annotations.json: It contains a dictionary for each video in the training data, with video ids as keys and a list of as values.

While running the script provide your directory path.

  python eval.py --train_data

Testing

For testing the model or making predictions on your own dataset, run the script eval.py.

  python eval.py --test_data

Result will be saved to a csv file 'test_data_predictions.csv'.

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Related tags

Overview

Video-Captioning

Approach

Results

Python Dependencies

Procedure

Training

Testing

Owner

PyTorch implementation of "Optimization Planning for 3D ConvNets"

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

code for Image Manipulation Detection by Multi-View Multi-Scale Supervision

Code for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space"

An example of semantic segmentation using tensorflow in eager execution.

Towards Understanding Quality Challenges of the Federated Learning: A First Look from the Lens of Robustness

Multitask Learning Strengthens Adversarial Robustness

CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer

Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

Code for the paper Task Agnostic Morphology Evolution.

PyTorch and GPyTorch implementation of the paper "Conditioning Sparse Variational Gaussian Processes for Online Decision-making."

Reference code for the paper CAMS: Color-Aware Multi-Style Transfer.

AI Based Smart Exam Proctoring Package

Alphabetical Letter Recognition

Predictive Maintenance LSTM

PyTorch implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation.

Active Offline Policy Selection With Python

Official source code to CVPR'20 paper, "When2com: Multi-Agent Perception via Communication Graph Grouping"

GARCH and Multivariate LSTM forecasting models for Bitcoin realized volatility with potential applications in crypto options trading, hedging, portfolio management, and risk management