Videocaptioning.pytorch - A simple implementation of video captioning

Overview

pytorch implementation of video captioning

recommend installing pytorch and python packages using Anaconda

This code is based on video-caption.pytorch

requirements (my environment, other versions of pytorch and torchvision should also support this code (not been verified!))

  • cuda
  • pytorch 1.7.1
  • torchvision 0.8.2
  • python 3
  • ffmpeg (can install using anaconda)

python packages

  • tqdm
  • pillow
  • nltk

Data

MSR-VTT. Download and put them in ./data/msr-vtt-data directory

|-data
  |-msr-vtt-data
    |-train-video
    |-test-video
    |-annotations
      |-train_val_videodatainfo.json
      |-test_videodatainfo.json

MSVD. Download and put them in ./data/msvd-data directory

|-data
  |-msvd-data
    |-YouTubeClips
    |-annotations
      |-AllVideoDescriptions.txt

Options

all default options are defined in opt.py or corresponding code file, change them for your like.

Acknowledgements

Some code refers to ImageCaptioning.pytorch

Usage

(Optional) c3d features (not verified)

you can use video-classification-3d-cnn-pytorch to extract features from video.

Steps

  1. preprocess MSVD annotations (convert txt file to json file)

refer to data/msvd-data/annotations/prepro_annotations.ipynb

  1. preprocess videos and labels
# For MSR-VTT dataset
# Train and Validata set
CUDA_VISIBLE_DEVICES=0 python prepro_feats.py \
    --video_path ./data/msr-vtt-data/train-video \
    --video_suffix mp4 \
    --output_dir ./data/msr-vtt-data/resnet152 \
    --model resnet152 \
    --n_frame_steps 40

# Test set
CUDA_VISIBLE_DEVICES=0 python prepro_feats.py \
    --video_path ./data/msr-vtt-data/test-video \
    --video_suffix mp4 \
    --output_dir ./data/msr-vtt-data/resnet152 \
    --model resnet152 \
    --n_frame_steps 40

python prepro_vocab.py \
    --input_json data/msr-vtt-data/annotations/train_val_videodatainfo.json data/msr-vtt-data/annotations/test_videodatainfo.json \
    --info_json data/msr-vtt-data/info.json \
    --caption_json data/msr-vtt-data/caption.json \
    --word_count_threshold 4

# For MSVD dataset
CUDA_VISIBLE_DEVICES=0 python prepro_feats.py \
    --video_path ./data/msvd-data/YouTubeClips \
    --video_suffix avi \
    --output_dir ./data/msvd-data/resnet152 \
    --model resnet152 \
    --n_frame_steps 40

python prepro_vocab.py \
    --input_json data/msvd-data/annotations/MSVD_annotations.json \
    --info_json data/msvd-data/info.json \
    --caption_json data/msvd-data/caption.json \
    --word_count_threshold 2
  1. Training a model
# For MSR-VTT dataset
CUDA_VISIBLE_DEVICES=0 python train.py \
    --epochs 1000 \
    --batch_size 300 \
    --checkpoint_path data/msr-vtt-data/save \
    --input_json data/msr-vtt-data/annotations/train_val_videodatainfo.json \
    --info_json data/msr-vtt-data/info.json \
    --caption_json data/msr-vtt-data/caption.json \
    --feats_dir data/msr-vtt-data/resnet152 \
    --model S2VTAttModel \
    --with_c3d 0 \
    --dim_vid 2048

# For MSVD dataset
CUDA_VISIBLE_DEVICES=0 python train.py \
    --epochs 1000 \
    --batch_size 300 \
    --checkpoint_path data/msvd-data/save \
    --input_json data/msvd-data/annotations/train_val_videodatainfo.json \
    --info_json data/msvd-data/info.json \
    --caption_json data/msvd-data/caption.json \
    --feats_dir data/msvd-data/resnet152 \
    --model S2VTAttModel \
    --with_c3d 0 \
    --dim_vid 2048
  1. test

    opt_info.json will be in same directory as saved model.

# For MSR-VTT dataset
CUDA_VISIBLE_DEVICES=0 python eval.py \
    --input_json data/msr-vtt-data/annotations/test_videodatainfo.json \
    --recover_opt data/msr-vtt-data/save/opt_info.json \
    --saved_model data/msr-vtt-data/save/model_xxx.pth \
    --batch_size 100

# For MSVD dataset
CUDA_VISIBLE_DEVICES=0 python eval.py \
    --input_json data/msvd-data/annotations/test_videodatainfo.json \
    --recover_opt data/msvd-data/save/opt_info.json \
    --saved_model data/msvd-data/save/model_xxx.pth \
    --batch_size 100

NOTE

This code is just a simple implementation of video captioning. And I have not verify whether the SCST training process and C3D feature are useful!

Acknowledgements

Some code refers to ImageCaptioning.pytorch

Owner
Yiyu Wang
Yiyu Wang
Statistical and Algorithmic Investing Strategies for Everyone

Eiten - Algorithmic Investing Strategies for Everyone Eiten is an open source toolkit by Tradytics that implements various statistical and algorithmic

Tradytics 2.5k Jan 02, 2023
Tool for working with Y-chromosome data from YFull and FTDNA

ycomp ycomp is a tool for working with Y-chromosome data from YFull and FTDNA. Run ycomp -h for information on how to use the program. Installation Th

Alexander Regueiro 2 Jun 18, 2022
PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

D2C: Diffuison-Decoding Models for Few-shot Conditional Generation Project | Paper PyTorch implementation of D2C: Diffuison-Decoding Models for Few-sh

Jiaming Song 90 Dec 27, 2022
Deployment of PyTorch chatbot with Flask

Chatbot Deployment with Flask and JavaScript In this tutorial we deploy the chatbot I created in this tutorial with Flask and JavaScript. This gives 2

Patrick Loeber (Python Engineer) 107 Dec 29, 2022
Official implementation for "Symbolic Learning to Optimize: Towards Interpretability and Scalability"

Symbolic Learning to Optimize This is the official implementation for ICLR-2022 paper "Symbolic Learning to Optimize: Towards Interpretability and Sca

VITA 8 Dec 19, 2022
Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Davis Rempe 367 Dec 24, 2022
Deep Learning to Improve Breast Cancer Detection on Screening Mammography

Shield: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Deep Learning to Improve Breast

Li Shen 305 Jan 03, 2023
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model About This repository contains the code to replicate the syn

Haruka Kiyohara 12 Dec 07, 2022
AirLoop: Lifelong Loop Closure Detection

AirLoop This repo contains the source code for paper: Dasong Gao, Chen Wang, Sebastian Scherer. "AirLoop: Lifelong Loop Closure Detection." arXiv prep

Chen Wang 53 Jan 03, 2023
CVPR '21: In the light of feature distributions: Moment matching for Neural Style Transfer

In the light of feature distributions: Moment matching for Neural Style Transfer (CVPR 2021) This repository provides code to recreate results present

Nikolai Kalischek 49 Oct 13, 2022
Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

COResets and Data Subset selection Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order

decile-team 244 Jan 09, 2023
A PyTorch implementation of the architecture of Mask RCNN

EDIT (AS OF 4th NOVEMBER 2019): This implementation has multiple errors and as of the date 4th, November 2019 is insufficient to be utilized as a reso

Sai Himal Allu 975 Dec 30, 2022
Weakly-supervised semantic image segmentation with CNNs using point supervision

Code for our ECCV paper What's the Point: Semantic Segmentation with Point Supervision. Summary This library is a custom build of Caffe for semantic i

27 Sep 14, 2022
Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

Relation Prediction as an Auxiliary Training Objective for Knowledge Base Completion This repo provides the code for the paper Relation Prediction as

Facebook Research 85 Jan 02, 2023
Efficient Multi Collection Style Transfer Using GAN

Proposed a new model that can make style transfer from single style image, and allow to transfer into multiple different styles in a single model.

Zhaozheng Shen 2 Jan 15, 2022
Saeed Lotfi 28 Dec 12, 2022
Prototypical Networks for Few shot Learning in PyTorch

Prototypical Networks for Few shot Learning in PyTorch Simple alternative Implementation of Prototypical Networks for Few Shot Learning (paper, code)

Orobix 835 Jan 08, 2023
Finite difference solution of 2D Poisson equation. Can handle Dirichlet, Neumann and mixed boundary conditions.

Poisson-solver-2D Finite difference solution of 2D Poisson equation Current version can handle Dirichlet, Neumann, and mixed (combination of Dirichlet

Mohammad Asif Zaman 34 Dec 23, 2022
The official GitHub repository for the Argoverse 2 dataset.

Argoverse 2 API Official GitHub repository for the Argoverse 2 family of datasets. If you have any questions or run into any problems with either the

Argo AI 156 Dec 23, 2022
Single-Shot Motion Completion with Transformer

Single-Shot Motion Completion with Transformer 👉 [Preprint] 👈 Abstract Motion completion is a challenging and long-discussed problem, which is of gr

FuxiCV 78 Dec 29, 2022