PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Overview

Long Short-Term Transformer for Online Action Detection

Introduction

This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

network

Environment

  • The code is developed with CUDA 10.2, Python >= 3.7.7, PyTorch >= 1.7.1

    1. [Optional but recommended] create a new conda environment.

      conda create -n lstr python=3.7.7
      

      And activate the environment.

      conda activate lstr
      
    2. Install the requirements

      pip install -r requirements.txt
      

Data Preparation

  1. Download the THUMOS'14 and TVSeries datasets.

  2. Extract feature representations for video frames.

    • For ActivityNet pretrained features, we use the ResNet-50 model for the RGB and optical flow inputs. We recommend to use this checkpoint in MMAction2.

    • For Kinetics pretrained features, we use the ResNet-50 model for the RGB inputs. We recommend to use this checkpoint in MMAction2. We use the BN-Inception model for the optical flow inputs. We recommend to use the model here.

    Note: We compute the optical flow using DenseFlow.

  3. If you want to use our dataloaders, please make sure to put the files as the following structure:

    • THUMOS'14 dataset:

      $YOUR_PATH_TO_THUMOS_DATASET
      ├── rgb_kinetics_resnet50/
      |   ├── video_validation_0000051.npy (of size L x 2048)
      │   ├── ...
      ├── flow_kinetics_bninception/
      |   ├── video_validation_0000051.npy (of size L x 1024)
      |   ├── ...
      ├── target_perframe/
      |   ├── video_validation_0000051.npy (of size L x 22)
      |   ├── ...
      
    • TVSeries dataset:

      $YOUR_PATH_TO_TVSERIES_DATASET
      ├── rgb_kinetics_resnet50/
      |   ├── Breaking_Bad_ep1.npy (of size L x 2048)
      │   ├── ...
      ├── flow_kinetics_bninception/
      |   ├── Breaking_Bad_ep1.npy (of size L x 1024)
      |   ├── ...
      ├── target_perframe/
      |   ├── Breaking_Bad_ep1.npy (of size L x 31)
      |   ├── ...
      
  4. Create softlinks of datasets:

    cd long-short-term-transformer
    ln -s $YOUR_PATH_TO_THUMOS_DATASET data/THUMOS
    ln -s $YOUR_PATH_TO_TVSERIES_DATASET data/TVSeries
    

Training

Training LSTR with 512 seconds long-term memory and 8 seconds short-term memory requires less 3 GB GPU memory.

The commands are as follows.

cd long-short-term-transformer
# Training from scratch
python tools/train_net.py --config_file $PATH_TO_CONFIG_FILE --gpu $CUDA_VISIBLE_DEVICES
# Finetuning from a pretrained model
python tools/train_net.py --config_file $PATH_TO_CONFIG_FILE --gpu $CUDA_VISIBLE_DEVICES \
    MODEL.CHECKPOINT $PATH_TO_CHECKPOINT

Online Inference

There are three kinds of evaluation methods in our code.

  • First, you can use the config SOLVER.PHASES "['train', 'test']" during training. This process devides each test video into non-overlapping samples, and makes prediction on the all the frames in the short-term memory as if they were the latest frame. Note that this evaluation result is not the final performance, since (1) for most of the frames, their short-term memory is not fully utlized and (2) for simplicity, samples in the boundaries are mostly ignored.

    cd long-short-term-transformer
    # Inference along with training
    python tools/train_net.py --config_file $PATH_TO_CONFIG_FILE --gpu $CUDA_VISIBLE_DEVICES \
        SOLVER.PHASES "['train', 'test']"
    
  • Second, you could run the online inference in batch mode. This process evaluates all video frames by considering each of them as the latest frame and filling the long- and short-term memories by tracing back in time. Note that this evaluation result matches the numbers reported in the paper, but batch mode cannot be further accelerated as descibed in paper's Sec 3.6. On the other hand, this mode can run faster when you use a large batch size, and we recomand to use it for performance benchmarking.

    cd long-short-term-transformer
    # Online inference in batch mode
    python tools/test_net.py --config_file $PATH_TO_CONFIG_FILE --gpu $CUDA_VISIBLE_DEVICES \
        MODEL.CHECKPOINT $PATH_TO_CHECKPOINT MODEL.LSTR.INFERENCE_MODE batch
    
  • Third, you could run the online inference in stream mode. This process tests frame by frame along the entire video, from the beginning to the end. Note that this evaluation result matches the both LSTR's performance and runtime reported in the paper. It processes the entire video as LSTR is applied to real-world scenarios. However, currently it only supports to test one video at each time.

    cd long-short-term-transformer
    # Online inference in stream mode
    python tools/test_net.py --config_file $PATH_TO_CONFIG_FILE --gpu $CUDA_VISIBLE_DEVICES \
        MODEL.CHECKPOINT $PATH_TO_CHECKPOINT MODEL.LSTR.INFERENCE_MODE stream DATA.TEST_SESSION_SET "['$VIDEO_NAME']"
    

Evaluation

Evaluate LSTR's performance for online action detection using perframe mAP or mcAP.

cd long-short-term-transformer
python tools/eval/eval_perframe --pred_scores_file $PRED_SCORES_FILE

Evaluate LSTR's performance at different action stages by evaluating each decile (ten-percent interval) of the video frames separately.

cd long-short-term-transformer
python tools/eval/eval_perstage --pred_scores_file $PRED_SCORES_FILE

Citations

If you are using the data/code/model provided here in a publication, please cite our paper:

@inproceedings{xu2021long,
	title={Long Short-Term Transformer for Online Action Detection},
	author={Xu, Mingze and Xiong, Yuanjun and Chen, Hao and Li, Xinyu and Xia, Wei and Tu, Zhuowen and Soatto, Stefano},
	booktitle={Conference on Neural Information Processing Systems (NeurIPS)},
	year={2021}
}

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Code repository for EMNLP 2021 paper 'Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods'

Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods This is the code repository to accompany the EMNLP 2021 paper on ad

Peru Bhardwaj 7 Sep 25, 2022
This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Self-Supervised Learning with Vision Transformers By Zhenda Xie*, Yutong Lin*, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao and Han Hu This repo is the

Swin Transformer 529 Jan 02, 2023
Camera calibration & 3D pose estimation tools for AcinoSet

AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs in the Wild Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fre

African Robotics Unit 42 Nov 16, 2022
Official Implementation for Fast Training of Neural Lumigraph Representations using Meta Learning.

Fast Training of Neural Lumigraph Representations using Meta Learning Project Page | Paper | Data Alexander W. Bergman, Petr Kellnhofer, Gordon Wetzst

Alex 39 Oct 08, 2022
A Streamlit demo demonstrating the Deep Dream technique. Adapted from the TensorFlow Deep Dream tutorial.

Streamlit Demo: Deep Dream A Streamlit demo demonstrating the Deep Dream technique. Adapted from the TensorFlow Deep Dream tutorial How to run this de

Streamlit 11 Dec 12, 2022
Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

JOINT This is the official implementation of Joint Inductive and Transductive learning for Video Object Segmentation, to appear in ICCV 2021. @inproce

Yunyao 35 Oct 16, 2022
The Python3 import playground

The Python3 import playground I have been confused about python modules and packages, this text tries to clear the topic up a bit. Sources: https://ch

Michael Moser 5 Feb 22, 2022
Detail-Preserving Transformer for Light Field Image Super-Resolution

DPT Official Pytorch implementation of the paper "Detail-Preserving Transformer for Light Field Image Super-Resolution" accepted by AAAI 2022 . Update

50 Jan 01, 2023
Locationinfo - A script helps the user to show network information such as ip address

Description This script helps the user to show network information such as ip ad

Roxcoder 1 Dec 30, 2021
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API

Timbre Dissimilarity Metrics A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API Installation pip install -e . Usag

Ben Hayes 21 Jan 05, 2022
OCR-D wrapper for detectron2 based segmentation models

ocrd_detectron2 OCR-D wrapper for detectron2 based segmentation models Introduction Installation Usage OCR-D processor interface ocrd-detectron2-segm

Robert Sachunsky 13 Dec 06, 2022
Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Project Azua 0. Overview Many modern AI algorithms are known to be data-hungry, whereas human decision-making is much more efficient. The human can re

Microsoft 197 Jan 06, 2023
[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention [Models and Raw results] (G

Multimedia Computing Group, Nanjing University 235 Jan 03, 2023
PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

DECOR-GAN PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement, Zhiqin Chen, Vladimir G. Kim, Matthew Fish

Zhiqin Chen 72 Dec 31, 2022
Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting

Decoupled Spatial-Temporal Transformer for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, J

51 Dec 13, 2022
PyTorch GPU implementation of the ES-RNN model for time series forecasting

Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm A GPU-enabled version of the hybrid ES-RNN model by Slawek et al that won the M4 time-series

Kaung 305 Jan 03, 2023
Phy-Q: A Benchmark for Physical Reasoning

Phy-Q: A Benchmark for Physical Reasoning Cheng Xue*, Vimukthini Pinto*, Chathura Gamage* Ekaterina Nikonova, Peng Zhang, Jochen Renz School of Comput

29 Dec 19, 2022
Python periodic table module

elemenpy Hello! elements.py is a small Python periodic table module that is used for calling certain information about an element. Installation Instal

Eric Cheng 2 Dec 27, 2021
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

Benedek Rozemberczki 201 Nov 09, 2022