[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

Last update: Nov 30, 2022

Overview

RTD-Net (ICCV 2021)

This repo holds the codes of paper: "Relaxed Transformer Decoders for Direct Action Proposal Generation", accepted in ICCV 2021.

News

[2021.8.17] We release codes, checkpoint and features on THUMOS14.

Overview

This paper presents a simple and end-to-end learnable framework (RTD-Net) for direct action proposal generation, by re-purposing a Transformer-alike architecture. Thanks to the parallel decoding of multiple proposals with explicit context modeling, our RTD-Net outperforms the previous state-of-the-art methods in temporal action proposal generation task on THUMOS14 and also yields a superior performance for action detection on this dataset. In addition, free of NMS post-processing, our detection pipeline is more efficient than previous methods.

Dependencies

Python 3.7 or higher
PyTorch 1.6 or higher
Torchvision
Numpy 1.19.2

Data Preparation

To reproduce the results in THUMOS14 without further changes:

Download the data from GoogleDrive.
Place I3D_features and TEM_scores into the folder data.

Checkpoint

Dataset	[email protected]	[email protected]	[email protected]	[email protected]	checkpoint
THUMOS14	41.52	49.33	56.41	62.91	link

Training

Use train.sh to train RTD-Net.


# First stage

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11323 --use_env main.py --window_size 100 --batch_size 32 --stage 1 --num_queries 32 --point_prob_normalize

# Second stage for relaxation mechanism

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11324 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-5 --stage 2 --epochs 10 --lr_drop 5 --num_queries 32 --point_prob_normalize --load outputs/checkpoint_best_sum_ar.pth

# Third stage for completeness head

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11325 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-4 --stage 3 --epochs 20 --num_queries 32 --point_prob_normalize --load outputs/checkpoint_best_sum_ar.pth

Testing

Inference with test.sh.

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11325 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-4 --stage 3 --epochs 20 --num_queries 32 --point_prob_normalize --eval --resume outputs/checkpoint_best_sum_ar.pth

References

We especially thank the contributors of the BSN, G-TAD and DETR for providing helpful code.

Citations

If you think our work is helpful, please feel free to cite our paper.

@InProceedings{Tan_2021_RTD,
    author    = {Tan, Jing and Tang, Jiaqi and Wang, Limin and Wu, Gangshan},
    title     = {Relaxed Transformer Decoders for Direct Action Proposal Generation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {13526-13535}
}

Contact

For any question, please file an issue or contact

Jing Tan: [email protected]
Jiaqi Tang: [email protected]

[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

Related tags

Overview

RTD-Net (ICCV 2021)

News

Overview

Dependencies

Data Preparation

Checkpoint

Training

Testing

References

Citations

Contact

Owner

Multimedia Computing Group, Nanjing University

Picasso: a methods for embedding points in 2D in a way that respects distances while fitting a user-specified shape.

Disease Informed Neural Networks (DINNs) — neural networks capable of learning how diseases spread, forecasting their progression, and finding their unique parameters (e.g. death rate).

Decorator for PyMC3

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

SOFT: Softmax-free Transformer with Linear Complexity, NeurIPS 2021 Spotlight

University of Rochester 2021 Summer REU focusing on music sentiment transfer using CycleGAN

Keras-retinanet - Keras implementation of RetinaNet object detection.

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Data-depth-inference - Data depth inference with python

A transformer-based method for Healthcare Image Captioning in Vietnamese

[CVPR 2021] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

RL Algorithms with examples in Python / Pytorch / Unity ML agents

Omniscient Video Super-Resolution

MAUS: A Dataset for Mental Workload Assessment Using Wearable Sensor - Baseline system

Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transformers to Guarantee TopologyPreservation in Segmentations"

code for "Feature Importance-aware Transferable Adversarial Attacks"

Locally Differentially Private Distributed Deep Learning via Knowledge Distillation (LDP-DL)

The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

1st Solution For NeurIPS 2021 Competition on ML4CO Dual Task