This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Last update: Sep 24, 2022

Related tags

Overview

Elaborative Rehearsal for Zero-shot Action Recognition

This is an official implementation of:

Shizhe Chen and Dong Huang, Elaborative Rehearsal for Zero-shot Action Recognition, ICCV, 2021. Arxiv Version

Elaborating a new concept and relating it to known concepts, we reach the dawn of zero-shot action recognition models being comparable to supervised models trained on few samples.

New SOTA results are also achieved on the standard ZSAR benchmarks (Olympics, HMDB51, UCF101) as well as the first large scale ZSAR benchmak (we proposed) on the Kinetics database.

Installation

git clone https://github.com/DeLightCMU/ElaborativeRehearsal.git
cd ElaborativeRehearsal
export PYTHONPATH=$(pwd):${PYTHONPATH}

pip install -r requirements.txt

# download pretrained models
bash scripts/download_premodels.sh

Zero-shot Action Recognition (ZSAR)

Extract Features in Video

spatial-temporal features

bash scripts/extract_tsm_features.sh '0,1,2'

object features

bash scripts/extract_object_features.sh '0,1,2'

ZSAR Training and Inference

Baselines: DEVISE, ALE, SJE, DEM, ESZSL and GCN.

# mtype: devise, ale, sje, dem, eszsl
mtype=devise
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} --is_train
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} --eval_set tst
# evaluate other splits
ksplit=1
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines_eval_splits.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} ${ksplit}

# gcn
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_kgraphs.py zeroshot/configs/zsl_baseline_kgraph_config.yaml --is_train
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_kgraphs.py zeroshot/configs/zsl_baseline_kgraph_config.yaml --eval_set tst

ER-ZSAR and ablations:

# TSM + ED class representation + AttnPool (2nd row in Table 4(b))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_vse.py zeroshot/configs/zsl_vse_wordembed_config.yaml --is_train --resume_file datasets/Kinetics/zsl220/word.glove42b.th

# TSM + ED class representation + BERT (last row in Table 4(a) and Table 4(b))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_vse.py zeroshot/configs/zsl_vse_config.yaml --is_train

# Obj + ED class representation + BERT + ER Loss (last row in Table 4(c))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_cptembed.py zeroshot/configs/zsl_cpt_config.yaml --is_train

# ER-ZSAR Full Model
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_ervse.py zeroshot/configs/zsl_ervse_config.yaml --is_train

Citation

If you find this repository useful, please cite our paper:

@proceeding{ChenHuang2021ER,
  title={Elaborative Rehearsal for Zero-shot Action Recognition},
  author={Shizhe Chen and Dong Huang},
  booktitle = {ICCV},
  year={2021}
}

This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Related tags

Overview

Elaborative Rehearsal for Zero-shot Action Recognition

Installation

Zero-shot Action Recognition (ZSAR)

Extract Features in Video

ZSAR Training and Inference

Citation

Acknowledgement

Owner

DeLightCMU

A scikit-learn compatible neural network library that wraps PyTorch

Covid19-Forecasting - An interactive website that tracks, models and predicts COVID-19 Cases

Res2Net for Instance segmentation and Object detection using MaskRCNN

PyTorch implementation of MLP-Mixer

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Official Code Implementation of the paper : XAI for Transformers: Better Explanations through Conservative Propagation

A novel Engagement Detection with Multi-Task Training (ED-MTT) system

Repo for 2021 SDD assessment task 2, by Felix, Anna, and James.

Code for the paper "TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks"

Multi-Agent Reinforcement Learning (MARL) method to learn scalable control polices for multi-agent target tracking.

Recurrent Scale Approximation (RSA) for Object Detection

Unofficial PyTorch implementation of SimCLR by Google Brain

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Random Erasing Data Augmentation. Experiments on CIFAR10, CIFAR100 and Fashion-MNIST

Label Mask for Multi-label Classification

This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

Development kit for MIT Scene Parsing Benchmark

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

Train emoji embeddings based on emoji descriptions.

SAPIEN Manipulation Skill Benchmark