Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

Overview

EAN: Event Adaptive Network

PWC

PyTorch Implementation of paper:

EAN: Event Adaptive Network for Enhanced Action Recognition

Yuan Tian, Yichao Yan, Xiongkuo Min, Guo Lu, Guangtao Zhai, Guodong Guo, and Zhiyong Gao

[ArXiv]

Main Contribution

Efficiently modeling spatial-temporal information in videos is crucial for action recognition. In this paper, we propose a unified action recognition framework to investigate the dynamic nature of video content by introducing the following designs. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer, which yields a sparse paradigm. We call the proposed framework as Event Adaptive Network (EAN) because both key designs are adaptive to the input video content. To exploit the short-term motions within local segments, we propose a novel and efficient Latent Motion Code (LMC) module, further improving the performance of the framework.

Content

Dependencies

Please make sure the following libraries are installed successfully:

Data Preparation

Following the common practice, we need to first extract videos into frames for fast data loading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Something-Something-V1 and V2, Kinetics, Diving48 datasets with this codebase. Basically, the processing of video data can be summarized into 3 steps:

  1. Extract frames from videos:

  2. Generate file lists needed for dataloader:

    • Each line of the list file will contain a tuple of (extracted video frame folder name, video frame number, and video groundtruth class). A list file looks like this:

      video_frame_folder 100 10
      video_2_frame_folder 150 31
      ...
      
    • Or you can use off-the-shelf tools provided by the repos: data_process/gen_label_xxx.py

  3. Edit dataset config information in datasets_video.py

Pretrained Models

Here, we provide the pretrained models of EAN models on Something-Something-V1 datasets. Recognizing actions in this dataset requires strong temporal modeling ability. EAN achieves state-of-the-art performance on these datasets. Notably, our method even surpasses optical flow based methods while with only RGB frames as input.

Something-Something-V1

Model Backbone FLOPs Val Top1 Val Top5 Checkpoints
EAN8F(RGB+LMC) ResNet-50 37G 53.4 81.1 [Jianguo Cloud]
EAN16(RGB+LMC) 74G 54.7 82.3
EAN16+8(RGB+LMC) 111G 57.2 83.9
EAN2 x (16+8)(RGB+LMC) 222G 57.5 84.3

Testing

For example, to test the EAN models on Something-Something-V1, you can first put the downloaded .pth.tar files into the "pretrained" folder and then run:

# test EAN model with 8frames clip
bash scripts/test/sthv1/RGB_LMC_8F.sh

# test EAN model with 16frames clip
bash scripts/test/sthv1/RGB_LMC_16F.sh

Training

We provided several scripts to train EAN with this repo, please refer to "scripts" folder for more details. For example, to train PAN on Something-Something-V1, you can run:

# train EAN model with 8frames clip
bash scripts/train/sthv1/RGB_LMC_8F.sh

Notice that you should scale up the learning rate with batch size. For example, if you use a batch size of 32 you should set learning rate to 0.005.

Other Info

References

This repository is built upon the following baseline implementations for the action recognition task.

Citation

Please [★star] this repo and [cite] the following arXiv paper if you feel our EAN useful to your research:

@misc{tian2021ean,
      title={EAN: Event Adaptive Network for Enhanced Action Recognition}, 
      author={Yuan Tian and Yichao Yan and Xiongkuo Min and Guo Lu and Guangtao Zhai and Guodong Guo and Zhiyong Gao},
      year={2021},
      eprint={2107.10771},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

For any questions, please feel free to open an issue or contact:

Yuan Tian: [email protected]
Owner
TianYuan
TianYuan
Segmentation models with pretrained backbones. PyTorch.

Python library with Neural Networks for Image Segmentation based on PyTorch. The main features of this library are: High level API (just two lines to

Pavel Yakubovskiy 6.6k Jan 06, 2023
PIXIE: Collaborative Regression of Expressive Bodies

PIXIE: Collaborative Regression of Expressive Bodies [Project Page] This is the official Pytorch implementation of PIXIE. PIXIE reconstructs an expres

Yao Feng 331 Jan 04, 2023
FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel ga

Tarun K 280 Dec 23, 2022
Fewshot-face-translation-GAN - Generative adversarial networks integrating modules from FUNIT and SPADE for face-swapping.

Few-shot face translation A GAN based approach for one model to swap them all. The table below shows our priliminary face-swapping results requiring o

768 Dec 24, 2022
Implicit Graph Neural Networks

Implicit Graph Neural Networks This repository is the official PyTorch implementation of "Implicit Graph Neural Networks". Fangda Gu*, Heng Chang*, We

Heng Chang 48 Nov 29, 2022
This is a collection of our NAS and Vision Transformer work.

AutoML - Neural Architecture Search This is a collection of our AutoML-NAS work iRPE (NEW): Rethinking and Improving Relative Position Encoding for Vi

Microsoft 832 Jan 08, 2023
PyTorch code for ICPR 2020 paper Future Urban Scene Generation Through Vehicle Synthesis

Future urban scene generation through vehicle synthesis This repository contains Pytorch code for the ICPR2020 paper "Future Urban Scene Generation Th

Alessandro Simoni 4 Oct 11, 2021
Haze Removal can remove slight to extreme cases of haze affecting an image

Haze Removal can remove slight to extreme cases of haze affecting an image. Its most typical use is for landscape photography where the haze causes low contrast and low saturation, but it can also be

Grace Ugochi Nneji 3 Feb 15, 2022
The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis.

deep-learning-LAM-avulsion-diagnosis The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis

1 Jan 12, 2022
Codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks

DominoSearch This is repository for codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense n

11 Sep 10, 2022
Using Python to Play Cyberpunk 2077

CyberPython 2077 Using Python to Play Cyberpunk 2077 This repo will contain code from the Cyberpython 2077 video series on Youtube (youtube.

Harrison 118 Oct 18, 2022
Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

Denoised-Smoothing-TF Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow. Denoised Smoothing is

Sayak Paul 19 Dec 11, 2022
Speech-Emotion-Analyzer - The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

Speech Emotion Analyzer The idea behind creating this project was to build a machine learning model that could detect emotions from the speech we have

Mitesh Puthran 965 Dec 24, 2022
Does MAML Only Work via Feature Re-use? A Data Set Centric Perspective

Does-MAML-Only-Work-via-Feature-Re-use-A-Data-Set-Centric-Perspective Does MAML Only Work via Feature Re-use? A Data Set Centric Perspective Installin

2 Nov 07, 2022
Representing Long-Range Context for Graph Neural Networks with Global Attention

Graph Augmentation Graph augmentation/self-supervision/etc. Algorithms gcn gcn+virtual node gin gin+virtual node PNA GraphTrans Augmentation methods N

UC Berkeley RISE 67 Dec 30, 2022
A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

idn-solver Paper | Project Page This repository contains the code release of our ICCV 2021 paper: A Confidence-based Iterative Solver of Depths and Su

zhaowang 43 Nov 17, 2022
The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition.

OverlapTransformer The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for

HAOMO.AI 136 Jan 03, 2023
[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods Large Scale Learning on Non-Homophilous Graphs: New Benchmark

60 Jan 03, 2023
Source code for our paper "Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash"

Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash Abstract: Apple recently revealed its deep perceptual hashing system NeuralHash to

<a href=[email protected]"> 11 Dec 03, 2022
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 05, 2022