Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Last update: Oct 11, 2022

Related tags

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Captioning for Untrimmed Videos (CVPR 2021).

Requirements

Python 3.6
Java 15.0.2
PyTorch 1.2
numpy, tqdm, h5py, scipy, six

Training & Inference

Data preparation

Download the pre-extracted video features of ActivityNet Captions or Charades Captions datasets from BaiduNetdisk (code: he21).
Decompress the downloaded files to the corresponding dataset folder in the ordered_feature/ directory.

Start training

Train our model without reinforcement learning, * can be activitynet or charades.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token/model.json ../results/*/dm.token/path.json --is_train

Fine-tune the pretrained model using self-critical with both accuracy and diversity rewards.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token.rl/model.json ../results/*/dm.token.rl/path.json --is_train --resume_file ../results/*/dm.token/model/epoch.*.th

Train our model with key frames selection.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/key_frames/model.json ../results/*/key_frames/path.json --is_train --resume_file ../results/*/key_frames/pretrained.th

It will achieve a slightly worse result with only a half of the video features used at inference phase for faster decoding. You need to download the pretrained.th model at first for the key-frame selection.

Evaluation

The trained checkpoints have been saved at the results/*/folder/model/ directory. After evaluation, the generated captions (corresponding to the name file in the public_split) and evaluating scores will be saved at results/*/folder/pred/tst/.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/folder/model.json ../results/*/folder/path.json --eval_set tst --resume_file ../results/*/folder/model/epoch.*.th

We also provide the pretrained models for the ActivityNet dataset here and Charades dataset here, which are re-run and achieve similar results with the paper.

Reference

If you find this repo helpful, please consider citing:

@inproceedings{song2021paragraph,
  title={Towards Diverse Paragraph Captioning for Untrimmed Videos},
  author={Song, Yuqing and Chen, Shizhe and Jin, Qin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Related tags

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

Requirements

Training & Inference

Data preparation

Start training

Evaluation

Reference

Owner

Yuqing Song

Public repo for the ICCV2021-CVAMD paper "Is it Time to Replace CNNs with Transformers for Medical Images?"

[NeurIPS 2021] Low-Rank Subspaces in GANs

DCGAN-tensorflow - A tensorflow implementation of Deep Convolutional Generative Adversarial Networks

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

Solver for Large-Scale Rank-One Semidefinite Relaxations

The official implementation of Variable-Length Piano Infilling (VLI).

Implementation of PyTorch-based multi-task pre-trained models

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

A coin flip game in which you can put the amount of money below or equal to 1000 and then choose heads or tail

Open source Python implementation of the HDR+ photography pipeline

NeurIPS-2021: Neural Auto-Curricula in Two-Player Zero-Sum Games.

Code used for the results in the paper "ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning"

MPI Interest Group on Algorithms on 1st semester 2021

The dataset of tweets pulling from Twitters with keyword: Hydroxychloroquine, location: US, Time: 2020

Repo for my Tensorflow/Keras CV experiments. Mostly revolving around the Danbooru20xx dataset

Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift (ICCV 2021)

A particular navigation route using satellite feed and can help in toll operations & traffic managemen

A Fast Sequence Transducer Implementation with PyTorch Bindings

a reimplementation of Holistically-Nested Edge Detection in PyTorch