Tensorflow implementation of soft-attention mechanism for video caption generation.

Overview

SA-tensorflow

Tensorflow implementation of soft-attention mechanism for video caption generation.

An example of soft-attention mechanism. The attention weight alpha indicates the temporal attention in one video based on each word.

[Yao et al. 2015 Describing Videos by Exploiting Temporal Structure] The original code implemented in Torch can be found here.

Prerequisites

  • Python 2.7
  • Tensorflow >= 0.7.1
  • NumPy
  • pandas
  • keras
  • java 1.8.0

Data

The MSVD [2] dataset can be download from here.

We pack the data into the format of HDF5, where each file is a mini-batch for training and has the following keys:

[u'data', u'fname', u'label', u'title']

batch['data'] stores the visual features. shape (n_step_lstm, batch_size, hidden_dim)

batch['fname'] stores the filenames(no extension) of videos. shape (batch_size)

batch['title'] stores the description. If there are multiple sentences correspond to one video, the other metadata such as visual features, filenames and labels have to duplicate for one-to-one mapping. shape (batch_size)

batch['label'] indicates where the video ends. For instance, [-1., -1., -1., -1., 0., -1., -1.] means that the video ends at index 4.

shape (n_step_lstm, batch_size)

Generate HDF5 data

We generate the HDF5 data by following the steps below. The codes are a little messy. If you have any questions, feel free to ask.

1. Generate Label

Once you change the video_path and output_path, you can generate labels by running the script:

python hdf5_generator/generate_nolabel.py

I set the length of each clip to 10 frames and the maximum length of frames to 450. You can change the parameters in function get_frame_list(frame_num).

2. Pack features together (no caption information)

Inputs:

label_path: The path for the labels generated earlier.

feature_path: The path that stores features such as VGG and C3D. You can change the directory name whatever you want.

Ouputs:

h5py_path: The path that you store the concatenation of different features, the code will automatically put the features in the subdirectory cont

python hdf5_generator/input_generator.py

Note that in function get_feats_depend_on_label(), you can choose whether to take the mean feature or random sample feature of frames in one clip. The random sample script is commented out since the performance is worse.

3. Add captions into HDF5 data

I set the maxmimum number of words in a caption to 35. feature folder is where our final output features store.

python hdf5_generator/trans_video_youtube.py

(The codes here are written by Kuo-Hao)

Generate data list

video_data_path_train = '$ROOTPATH/SA-tensorflow/examples/train_vn.txt'

You can change the path variable to the absolute path of your data. Then simply run python getlist.py to generate the list.

P.S. The filenames of HDF5 data start with train, val, test.

Usage

training

$ python Att.py --task train

testing

Test the model after a certain number of training epochs.

$ python Att.py --task test --net models/model-20

Author

Tseng-Hung Chen

Kuo-Hao Zeng

Disclaimer

We modified the code from this repository jazzsaxmafia/video_to_sequence to the temporal-attention model.

References

[1] L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. arXiv:1502.08029v4, 2015.

[2] chen:acl11, title = "Collecting Highly Parallel Data for Paraphrase Evaluation", author = "David L. Chen and William B. Dolan", booktitle = "Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011)", address = "Portland, OR", month = "June", year = 2011

[3] Microsoft COCO Caption Evaluation

Owner
Paul Chen
Paul Chen
Heart Arrhythmia Classification

This program takes and input of an ECG in European Data Format (EDF) and outputs the classification for heartbeats into normal vs different types of arrhythmia . It uses a deep learning model for cla

4 Nov 02, 2022
Anchor-free Oriented Proposal Generator for Object Detection

Anchor-free Oriented Proposal Generator for Object Detection Gong Cheng, Jiabao Wang, Ke Li, Xingxing Xie, Chunbo Lang, Yanqing Yao, Junwei Han, Intro

jbwang1997 56 Nov 15, 2022
Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

Meta Research 122 Dec 13, 2022
Lane assist for ETS2, built with the ultra-fast-lane-detection model.

Euro-Truck-Simulator-2-Lane-Assist Lane assist for ETS2, built with the ultra-fast-lane-detection model. This project was made possible by the amazing

36 Jan 05, 2023
A powerful framework for decentralized federated learning with user-defined communication topology

Scatterbrained Decentralized Federated Learning Scatterbrained makes it easy to build federated learning systems. In addition to traditional federated

Johns Hopkins Applied Physics Laboratory 7 Sep 26, 2022
Supplementary code for TISMIR paper "Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form"

Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form This is supplementary code for the TISMIR paper Sliding-Window Pitch-Class H

1 Nov 27, 2021
Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

Jungsoo Lee 16 Jun 30, 2022
Predicting Student Attentiveness using OpenCV

Predicting-Student-Attentiveness-using-OpenCV The model will predict if a student is attentive or not through facial parameter received through the st

Johann Pinto 2 Aug 20, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022
Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

TTNet-Pytorch The implementation for the paper "TTNet: Real-time temporal and spatial video analysis of table tennis" An introduction of the project c

Nguyen Mau Dung 438 Dec 29, 2022
A library for augmentation of a YOLO-formated dataset

YOLO Dataset Augmentation lib Инструкция по использованию этой библиотеки Запуск всех файлов осуществлять из консоли. GoogleCrawl_to_Dataset.py Это ск

Egor Orel 1 Dec 10, 2022
Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification

Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification This repository is the official implementation of [Dealing With Misspeci

0 Oct 25, 2021
HistoKT: Cross Knowledge Transfer in Computational Pathology

HistoKT: Cross Knowledge Transfer in Computational Pathology Exciting News! HistoKT has been accepted to ICASSP 2022. HistoKT: Cross Knowledge Transfe

Mahdi S. Hosseini 5 Jan 05, 2023
The MLOps platform for innovators 🚀

​ DS2.ai is an integrated AI operation solution that supports all stages from custom AI development to deployment. It is an AI-specialized platform service that collects data, builds a training datas

9 Jan 03, 2023
[제 13회 투빅스 컨퍼런스] OK Mugle! - 장르부터 멜로디까지, Content-based Music Recommendation

Ok Mugle! 🎵 장르부터 멜로디까지, Content-based Music Recommendation 'Ok Mugle!'은 제13회 투빅스 컨퍼런스(2022.01.15)에서 진행한 음악 추천 프로젝트입니다. Description 📖 본 프로젝트에서는 Kakao

SeongBeomLEE 5 Oct 09, 2022
Official PyTorch Implementation of SSMix (Findings of ACL 2021)

SSMix: Saliency-based Span Mixup for Text Classification (Findings of ACL 2021) Official PyTorch Implementation of SSMix | Paper Abstract Data augment

Clova AI Research 52 Dec 27, 2022
Official code for our EMNLP2021 Outstanding Paper MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks

MindCraft Authors: Cristian-Paul Bara*, Sky CH-Wang*, Joyce Chai This is the official code repository for the paper (arXiv link): Cristian-Paul Bara,

Situated Language and Embodied Dialogue (SLED) Research Group 14 Dec 29, 2022
Implementation for Shape from Polarization for Complex Scenes in the Wild

sfp-wild Implementation for Shape from Polarization for Complex Scenes in the Wild project website | paper Code and dataset will be released soon. Int

Chenyang LEI 41 Dec 23, 2022
Scientific Computation Methods in C and Python (Open for Hacktoberfest 2021)

Sci - cpy README is a stub. Do expand it. Objective This repository is meant to be a ready reference for scientific computation methods. Do ⭐ it if yo

Sandip Dutta 7 Oct 12, 2022
Codes and pretrained weights for winning submission of 2021 Brain Tumor Segmentation (BraTS) Challenge

Winning submission to the 2021 Brain Tumor Segmentation Challenge This repo contains the codes and pretrained weights for the winning submission to th

94 Dec 28, 2022