TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

Related tags

Deep LearningTSP
Overview

PWC PWC PWC PWC

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

[Paper] [Project Website]

This repository holds the source code, pretrained models, and pre-extracted features for the TSP method.

Please cite this work if you find TSP useful for your research.

@article{alwassel2020tsp,
  title={TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks},
  author={Alwassel, Humam and Giancola, Silvio and Ghanem, Bernard},
  journal={arXiv preprint arXiv:2011.11479},
  year={2020}
}

Pre-extracted TSP Features

We provide pre-extracted features for ActivityNet v1.3 and THUMOS14 videos. The feature files are saved in H5 format, where we map each video-name to a features tensor of size N x 512, where N is the number of features and 512 is the feature size. Use h5py python package to read the feature files. Not familiar with H5 files or h5py? here is a quick start guide.

For ActivityNet v1.3 dataset

Download: [train subset] [valid subset] [test subset]

Details: The features are extracted from the R(2+1)D-34 encoder pretrained with TSP on ActivityNet (released model) using clips of 16 frames at a frame rate of 15 fps and a stride of 16 frames (i.e., non-overlapping clips). This gives one feature vector per 16/15 ~= 1.067 seconds.

For THUMOS14 dataset

Download: [valid subset] [test subset]

Details: The features are extracted from the R(2+1)D-34 encoder pretrained with TSP on THUMOS14 (released model) using clips of 16 frames at a frame rate of 15 fps and a stride of 1 frame (i.e., dense overlapping clips). This gives one feature vector per 1/15 ~= 0.067 seconds.

Setup

Clone this repository and create the conda environment.

git clone https://github.com/HumamAlwassel/TSP.git
cd TSP
conda env create -f environment.yml
conda activate tsp

Data Preprocessing

Follow the instructions here to download and preprocess the input data.

Training

We provide training scripts for the TSP models and the TAC baselines here.

Feature Extraction

You can extract features from released pretrained models or from local checkpoints using the scripts here.

Acknowledgment: Our source code borrows implementation ideas from pytorch/vision and facebookresearch/VMZ repositories.

Comments
  • LOSS does not decrease during training

    LOSS does not decrease during training

    My data set is small, 1500 videos, all under 10 seconds in length. The current training results of this model are as follows: 1640047275(1)

    The experimental Settings adopted are: Batch_size=32,FACTOR=2. Is such a situation normal? If it is abnormal, what should be done?

    opened by ZChengLong578 5
  • H5 files generated about GVF features

    H5 files generated about GVF features

    Hi, @HumamAlwassel Thanks for your excellent work and for sharing the code. When I was training my dataset, I read your explanation on GVF feature generation. Do I need to combine .pkl files generated by the training set and valid set into .h5 files when I go to step 3?

    opened by ZChengLong578 5
  • The LOSS value is too large and does not decrease

    The LOSS value is too large and does not decrease

    Hi, @HumamAlwassel, I'm sorry to bother you again. I did it without or very little background (no action). Now I have added more background (no Action), but the LOSS value is very large and does not decrease. The specific situation is shown in the following figure: 3ed8aa4893a75580fc15295ef5acb27 Here are the files for the training set and validation set: 90dbeb733f39c8a64cecf13b03542ba What can I do to solve this problem?

    opened by ZChengLong578 3
  • Use the pretraining model to train other datasets

    Use the pretraining model to train other datasets

    Hi, @HumamAlwassel After downloading the pre-training model as you said, I overwrote the value of epoch to 0. The following changes were then made in the code: 1653905168503 1653905194890 1653905230207 I would like you to take a look, is the change I made in the code correct? Or should I replace the initial tac-on-kinetics Pretrained weights with this instead of using it in the resume?

    opened by ZChengLong578 2
  • Inference unseen video using pretrained model

    Inference unseen video using pretrained model

    Hi @HumamAlwassel, Thanks for your excellent work. I really appreciated it. I've trained your work on my own dataset. However, I am thinking about how to use trained model to inference unseen videos. Could you give me some examples that export result of a video such as action label and its start or end time.

    Best regards,

    opened by t2kien 2
  • Data sampling problems

    Data sampling problems

    Hi, @HumamAlwassel I'm sorry to trouble you again. The duration of my dataset action was short and many partitions were removed, as shown below: 1641360174(1) However, after observation, I find that it does not seem to be the problem with the length of the video. Actions with a length of 0-1.5 seconds are in the video, but actions with a length of 1.5-3 seconds are not in the video. Why is this? 1641360277(1)

    opened by ZChengLong578 2
  •  RuntimeError(f'<UntrimmedVideoDataset>: got clip of length {vframes.shape[0]} != {self.clip_length}.'

    RuntimeError(f': got clip of length {vframes.shape[0]} != {self.clip_length}.'

    Traceback (most recent call last): File "train.py", line 290, in <module> main(args) File "train.py", line 260, in main train_one_epoch(model=model, criterion=criterion, optimizer=optimizer, lr_scheduler=lr_scheduler, File "train.py", line 63, in train_one_epoch for sample in metric_logger.log_every(data_loader, print_freq, header, device=device): File "/media/bruce/2T/projects/TSP/train/../common/utils.py", line 137, in log_every for obj in iterable: File "/home/bruce/anaconda2/envs/tsp/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/home/bruce/anaconda2/envs/tsp/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/home/bruce/anaconda2/envs/tsp/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/home/bruce/anaconda2/envs/tsp/lib/python3.8/site-packages/torch/_utils.py", line 394, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/bruce/anaconda2/envs/tsp/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/bruce/anaconda2/envs/tsp/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/bruce/anaconda2/envs/tsp/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/media/bruce/2T/projects/TSP/train/untrimmed_video_dataset.py", line 86, in __getitem__ raise RuntimeError(f'<UntrimmedVideoDataset>: got clip of length {vframes.shape[0]} != {self.clip_length}.' RuntimeError: <UntrimmedVideoDataset>: got clip of length 15 != 16.filename=/mnt/nas/bruce14t/THUMOS14/valid/video_validation_0000420.mp4, clip_t_start=526.7160991305855, clip_t_end=527.7827657972522, fps=30.0, t_start=498.2, t_end=546.9

    I am very impressed by your wonderful work. When I try to reproduce the bash train_tsp_on_thumos14.sh for the THUMOS14 dataset, I got the above data loading issue. The calculation of the start and end of input clips seems not to work well for all the clips (code Line 74-78 of train/untrimmed_video_dataset.py). Could you provide some help with it? Thank you very much in advance.

    opened by bruceyo 2
  • How do I calculate mean and std for a new dataset?

    How do I calculate mean and std for a new dataset?

    Thanks for your inspiring code with detailed explanations! I have learnt a lot from that and now I'm trying to do some experiments in another dataset. But some implementation details confuse me.

    I notice that in the dataset transform part, there is a normalizing step. normalize = T.Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989])

    So how do I calculate the mean and std for a new dataset? Should I extract frames from videos first, then calculate mean & std inside all the frames in all videos for each RGB channel?

    opened by xjtupanda 1
  • Similar to issue #11 getting RuntimeError(f'<UntrimmedVideoDataset>: got clip of length {vframes.shape[0]} != {self.clip_length}.'

    Similar to issue #11 getting RuntimeError(f': got clip of length {vframes.shape[0]} != {self.clip_length}.'

    I am working with ActivityNet-v1.3 data converted to grayscale.

    I followed the preprocessing step highlighted here.

    However, I am still facing this issue similar to #11 , wanted to check if I am missing something or if there are any known fixes.

    Example from the log:

    1. RuntimeError: <UntrimmedVideoDataset>: got clip of length 15 != 16.filename=~/ActivityNet/grayscale_split/train/v_bNuRrXSjJl0.mp4, clip_t_start=227.63093165194988, clip_t_end=228.69759831861654, fps=30.0, t_start=219.1265882558503, t_end=228.7

    2. RuntimeError: <UntrimmedVideoDataset>: got clip of length 13 != 16.filename=~/ActivityNet/grayscale_split/train/v_nTNkGOtp7aQ.mp4, clip_t_start=33.341372258903775, clip_t_end=34.408038925570445, fps=30.0, t_start=25.58139772698908, t_end=34.53333333333333

    3. RuntimeError: <UntrimmedVideoDataset>: got clip of length 1 != 16.filename=~/ActivityNet/grayscale_split/train/v_7Iy7Cjv2SAE.mp4, clip_t_start=190.79558490339477, clip_t_end=191.86225157006143, fps=30.0, t_start=131.42849249141963, t_end=195.0

    Also, is there a recommended way to skip these files instead of raising the issue while training. The above issues came for different runs and at different epochs.

    opened by vc-30 1
  • Accuracy don't increase

    Accuracy don't increase

    Thank you for your reply! I used the above code to train my data set and found that the accuracy rate has not changed much and has remained around 3. Here is the output of the training: image Do you know what caused it?

    opened by ZChengLong578 1
  • question about pretrain-model

    question about pretrain-model

    Hi, thank you for your excellent work. I have a problem with your model. It is extracted TSP Features in ActivityNet. When the objects present in my video are not in ActivityNet, the model fails to recognize. As an example, ActivityNet's animals are only dogs and horses, but when my video is a cat, I run into trouble. I'm guessing because the model hasn't seen cats, one of my solution is to use ImageNet-22k pretrained weights and then do extracted TSP Features in ActivityNet. I don't know if my thinking is right. If it is correct, could you please update your code about using ImageNet-22k pretrained weights? Thank you very much for your excellent work.

    opened by qt2139 1
Releases(thumos14_features)
Owner
Humam Alwassel
PhD Student, Computer Vision Researcher, and Deep Learning "Hacker".
Humam Alwassel
In this project, two programs can help you take full agvantage of time on the model training with a remote server

In this project, two programs can help you take full agvantage of time on the model training with a remote server, which can push notification to your phone about the information during model trainin

GrayLee 8 Dec 27, 2022
A Keras implementation of YOLOv4 (Tensorflow backend)

keras-yolo4 请使用更完善的版本: https://github.com/miemie2013/Keras-YOLOv4 Please visit here for more complete model: https://github.com/miemie2013/Keras-YOLOv

384 Nov 29, 2022
Nsdf: A mesh SDF with just some code we can directly paste into our raymarcher

nsdf Representing SDFs of arbitrary meshes has been a bit tricky so far. Express

Jan Ivanecky 5 Feb 18, 2022
Weakly supervised medical named entity classification

Trove Trove is a research framework for building weakly supervised (bio)medical named entity recognition (NER) and other entity attribute classifiers

60 Nov 18, 2022
Single Image Random Dot Stereogram for Tensorflow

TensorFlow-SIRDS Single Image Random Dot Stereogram for Tensorflow SIRDS is a means to present 3D data in a 2D image. It allows for scientific data di

Greg Peatfield 5 Aug 10, 2022
NAACL2021 - COIL Contextualized Lexical Retriever

COIL Repo for our NAACL paper, COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. The code covers learning

Luyu Gao 108 Dec 31, 2022
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

Yazhou XING 90 Oct 19, 2022
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI'22)

ShuttleNet: Position-aware Rally Progress and Player Styles Fusion for Stroke Forecasting in Badminton (AAAI 2022) Official code of the paper ShuttleN

Wei-Yao Wang 11 Nov 30, 2022
Source code and Dataset creation for the paper "Neural Symbolic Regression That Scales"

NeuralSymbolicRegressionThatScales Pytorch implementation and pretrained models for the paper "Neural Symbolic Regression That Scales", presented at I

35 Nov 25, 2022
Official PyTorch Implementation of Hypercorrelation Squeeze for Few-Shot Segmentation, arXiv 2021

Hypercorrelation Squeeze for Few-Shot Segmentation This is the implementation of the paper "Hypercorrelation Squeeze for Few-Shot Segmentation" by Juh

Juhong Min 165 Dec 28, 2022
Generative Art Using Neural Visual Grammars and Dual Encoders

Generative Art Using Neural Visual Grammars and Dual Encoders Arnheim 1 The original algorithm from the paper Generative Art Using Neural Visual Gramm

DeepMind 231 Jan 05, 2023
"Segmenter: Transformer for Semantic Segmentation" reproduced via mmsegmentation

Segmenter-based-on-OpenMMLab "Segmenter: Transformer for Semantic Segmentation, arxiv 2105.05633." reproduced via mmsegmentation. We reproduce Segment

EricKani 22 Feb 24, 2022
Supporting code for the Neograd algorithm

Neograd This repo supports the paper Neograd: Gradient Descent with a Near-Ideal Learning Rate, which introduces the algorithm "Neograd". The paper an

Michael Zimmer 12 May 01, 2022
All of the figures and notebooks for my deep learning book, for free!

"Deep Learning - A Visual Approach" by Andrew Glassner This is the official repo for my book from No Starch Press. Ordering the book My book is called

Andrew Glassner 227 Jan 04, 2023
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022
Unofficial implementation of the paper: PonderNet: Learning to Ponder in TensorFlow

PonderNet-TensorFlow This is an Unofficial Implementation of the paper: PonderNet: Learning to Ponder in TensorFlow. Official PyTorch Implementation:

1 Oct 23, 2022
Code for A Volumetric Transformer for Accurate 3D Tumor Segmentation

VT-UNet This repo contains the supported pytorch code and configuration files to reproduce 3D medical image segmentaion results of VT-UNet. Environmen

Himashi Amanda Peiris 114 Dec 20, 2022
A smart Chat bot that can help to know about corona virus and Make prediction of corona using X-ray.

TRINIT_Hum_kuchh_nahi_karenge_ML01 Document Link https://github.com/Jatin-Goyal-552/TRINIT_Hum_kuchh_nahi_karenge_ML01/blob/main/hum_kuchh_nahi_kareng

JatinGoyal 1 Feb 03, 2022
Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

[AAAI2022] Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics Overall pipeline of OCN. Paper Link: [arXiv] [AAAI

13 Nov 21, 2022
Pytorch implementation of the paper: "A Unified Framework for Separating Superimposed Images", in CVPR 2020.

Deep Adversarial Decomposition PDF | Supp | 1min-DemoVideo Pytorch implementation of the paper: "Deep Adversarial Decomposition: A Unified Framework f

Zhengxia Zou 72 Dec 18, 2022