Yet another video caption

Last update: May 26, 2022

Related tags

Deep Learning yet-another-video-caption

Overview

yet-another-video-caption

数据集配置

准备数据集

将原始数据集重新组织成统一的格式后，放置于 ./dataset 中。

数据集的组织格式为：

./dataset
    train/
        video/
            *.avi
        ...
        info.json
    test/
        video/ 
            *.avi
        ...

自动配置

通常你只需要使用数据集的一个子集，此时请考虑运行自动抽取脚本 makedata.py。

所有数据位于 ./data 中。

所有视频（包括 train/val/test）位于 ./data/video 中。

所有视频信息（包括 train/val/test）输入到 ./data/input.json。

程序会在 ./data 中产生一些中间信息，请勿修改。

依赖

pip install tqdm pillow pretrainedmodels nltk

此外，请确保已当前环境下已经正确配置 CUDA 运行库，CUDNN，Pytorch(GPU)，ffmpeg，JDK

食用步骤

确保数据集已正确配置
确保依赖已经正确安装
抽取数据，将你希望使用的 train/val/test 划分参数输入 makedata.py 中，然后执行该脚本
依次执行（请自行修改 batch_size 和 saved_model 参数！）

python prepro_feats.py --output_dir data/feats/resnet152 --model resnet152
python prepro_vocab.py
python train.py --epochs 3001 --batch_size 1 --checkpoint_path data/save --feats_dir data/feats/resnet152 --model S2VTAttModel --with_c3d 0 --dim_vid 2048
python eval.py --recover_opt data/save/opt_info.json --saved_model data/save/model_10.pth --batch_size 1

速度测试

以下结果测试于单张 2080Ti

预处理（ResNet152 特征提取）：共 40min

训练速度（batch_size=32）：6.20 it/s

Todo

大小写问题

References

https://github.com/xiadingZ/video-caption.pytorch

Yet another video caption

Related tags

Overview

yet-another-video-caption

数据集配置

准备数据集

自动配置

依赖

食用步骤

速度测试

Todo

References

Owner

Fan Zhimin

ACV is a python library that provides explanations for any machine learning model or data.

Lua-parser-lark - An out-of-box Lua parser written in Lark

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

General neural ODE and DAE modules for power system dynamic modeling.

Use your Philips Hue lights as Racing Flags. Works with Assetto Corsa, Assetto Corsa Competizione and iRacing.

Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

CONetV2: Efficient Auto-Channel Size Optimization for CNNs

Deconfounding Temporal Autoencoder: Estimating Treatment Effects over Time Using Noisy Proxies

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Brain tumor detection using CNN (InceptionResNetV2 Model)

Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli

Federated Learning Based on Dynamic Regularization

PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

Implementation of Continuous Sparsification, a method for pruning and ticket search in deep networks

Examples of using f2py to get high-speed Fortran integrated with Python easily

YoHa - A practical hand tracking engine.

TensorFlow implementation of original paper : https://github.com/hszhao/PSPNet

Learning Super-Features for Image Retrieval

Causal estimators for use with WhyNot

Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021)