Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Overview

Temporal Segment Networks (TSN)

We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation for TSN as well as other STOA frameworks for various tasks. We highly recommend you switch to it. This repo will keep on being suppported for Caffe users.

This repository holds the codes and models for the papers

Temporal Segment Networks for Action Recognition in Videos, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, TPAMI, 2018.

[Arxiv Preprint]

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, ECCV 2016, Amsterdam, Netherlands.

[Arxiv Preprint]

News & Updates

Jul. 20, 2018 - For those having trouble building the TSN toolkit, we have provided a built docker image you can use. Download it from DockerHub. It contains OpenCV, Caffe, DenseFlow, and this codebase. All built and ready to use with NVIDIA-Docker

Sep. 8, 2017 - We released TSN models trained on the Kinetics dataset with 76.6% single model top-1 accuracy. Find the model weights and transfer learning experiment results on the website.

Aug 10, 2017 - An experimental pytorch implementation of TSN is released github

Nov. 5, 2016 - The project page for TSN is online. website

Sep. 14, 2016 - We fixed a legacy bug in Caffe. Some parameters in TSN training are affected. You are advised to update to the latest version.

FAQ, How to add a custom dataset

Below is the guidance to reproduce the reported results and explore more.

Contents


Usage Guide

Prerequisites

[back to top]

There are a few dependencies to run the code. The major libraries we use are

The codebase is written in Python. We recommend the Anaconda Python distribution. Matlab scripts are provided for some critical steps like video-level testing.

The most straightforward method to install these libraries is to run the build-all.sh script.

Besides software, GPU(s) are required for optical flow extraction and model training. Our Caffe modification supports highly efficient parallel training. Just throw in as many GPUs as you like and enjoy.

Code & Data Preparation

Get the code

[back to top]

Use git to clone this repository and its submodules

git clone --recursive https://github.com/yjxiong/temporal-segment-networks

Then run the building scripts to build the libraries.

bash build_all.sh

It will build Caffe and dense_flow. Since we need OpenCV to have Video IO, which is absent in most default installations, it will also download and build a local installation of OpenCV and use its Python interfaces.

Note that to run training with multiple GPUs, one needs to enable MPI support of Caffe. To do this, run

MPI_PREFIX=<root path to openmpi installation> bash build_all.sh MPI_ON

Get the videos

[back to top]

We experimented on two mainstream action recognition datasets: UCF-101 and HMDB51. Videos can be downloaded directly from their websites. After download, please extract the videos from the rar archives.

  • UCF101: the ucf101 videos are archived in the downloaded file. Please use unrar x UCF101.rar to extract the videos.
  • HMDB51: the HMDB51 video archive has two-level of packaging. The following commands illustrate how to extract the videos.
mkdir rars && mkdir videos
unrar x hmdb51-org.rar rars/
for a in $(ls rars); do unrar x "rars/${a}" videos/; done;

Get trained models

[back to top]

We provided the trained model weights in Caffe style, consisting of specifications in Protobuf messages, and model weights. In the codebase we provide the model spec for UCF101 and HMDB51. The model weights can be downloaded by running the script

bash scripts/get_reference_models.sh

Extract Frames and Optical Flow Images

[back to top]

To run the training and testing, we need to decompose the video into frames. Also the temporal stream networks need optical flow or warped optical flow images for input.

These can be achieved with the script scripts/extract_optical_flow.sh. The script has three arguments

  • SRC_FOLDER points to the folder where you put the video dataset
  • OUT_FOLDER points to the root folder where the extracted frames and optical images will be put in
  • NUM_WORKER specifies the number of GPU to use in parallel for flow extraction, must be larger than 1

The command for running optical flow extraction is as follows

bash scripts/extract_optical_flow.sh SRC_FOLDER OUT_FOLDER NUM_WORKER

It will take from several hours to several days to extract optical flows for the whole datasets, depending on the number of GPUs.

Testing Provided Models

Get reference models

[back to top]

To help reproduce the results reported in the paper, we provide reference models trained by us for instant testing. Please use the following command to get the reference models.

bash scripts/get_reference_models.sh

Video-level testing

[back to top]

We provide a Python framework to run the testing. For the benchmark datasets, we will measure average accuracy on the testing splits. We also provide the facility to analyze a single video.

Generally, to test on the benchmark dataset, we can use the scripts eval_net.py and eval_scores.py.

For example, to test the reference rgb stream model on split 1 of ucf 101 with 4 GPUs, run

python tools/eval_net.py ucf101 1 rgb FRAME_PATH \
 models/ucf101/tsn_bn_inception_rgb_deploy.prototxt models/ucf101_split_1_tsn_rgb_reference_bn_inception.caffemodel \
 --num_worker 4 --save_scores SCORE_FILE

where FRAME_PATH is the path you extracted the frames of UCF-101 to and SCORE_FILE is the filename to store the extracted scores.

One can also use cached score files to evaluate the performance. To do this, issue the following command

python tools/eval_scores.py SCORE_FILE

The more important function of eval_scores.py is to do modality fusion. For example, once we got the scores of rgb stream in RGB_SCORE_FILE and flow stream in FLOW_SCORE_FILE. The fusion result with weights of 1:1.5 can be achieved with

python tools/eval_scores.py RGB_SCORE_FILE FLOW_SCORE_FILE --score_weights 1 1.5

To view the full help message of these scripts, run python eval_net.py -h or python eval_scores.py -h.

Training Temporal Segment Networks

[back to top]

Training TSN is straightforward. We have provided the necessary model specs, solver configs, and initialization models. To achieve optimal training speed, we strongly advise you to turn on the parallel training support in the Caffe toolbox using following build command

MPI_PREFIX=<root path to openmpi installation> bash build_all.sh MPI_ON

where root path to openmpi installation points to the installation of the OpenMPI, for example /usr/local/openmpi/.

Construct file lists for training and validation

[back to top]

The data feeding in training relies on VideoDataLayer in Caffe. This layer uses a list file to specify its data sources. Each line of the list file will contain a tuple of extracted video frame path, video frame number, and video groundtruth class. A list file looks like

video_frame_path 100 10
video_2_frame_path 150 31
...

To build the file lists for all 3 splits of the two benchmark dataset, we have provided a script. Just use the following command

bash scripts/build_file_list.sh ucf101 FRAME_PATH

and

bash scripts/build_file_list.sh hmdb51 FRAME_PATH

The generated list files will be put in data/ with names like ucf101_flow_val_split_2.txt.

Get initialization models

[back to top]

We have built the initialization model weights for both rgb and flow input. The flow initialization models implements the cross-modality training technique in the paper. To download the model weights, run

bash scripts/get_init_models.sh

Start training

[back to top]

Once all necessities ready, we can start training TSN. For this, use the script scripts/train_tsn.sh. For example, the following command runs training on UCF101 with rgb input

bash scripts/train_tsn.sh ucf101 rgb

the training will run with default settings on 4 GPUs. Usually, it takes around 1 hours to train the rgb model and 4 hours for flow models, on 4 GTX Titan X GPUs.

The learned model weights will be saved in models/. The aforementioned testing process can be used to evaluate them.

Config the training process

[back to top]

Here we provide some information on customizing the training process

  • Change split: By default, the training is conducted on split 1 of the datasets. To change the split, one can modify corresponding model specs and solver files. For example, to train on split 2 of UCF101 with rgb input, one can modify the file models/ucf101/tsn_bn_inception_rgb_train_val.prototxt. On line 8, change
source: "data/ucf101_rgb_train_split_1.txt"`

to

`source: "data/ucf101_rgb_train_split_2.txt"`

On line 34, change

source: "data/ucf101_rgb_val_split_1.txt"

to

source: "data/ucf101_rgb_val_split_2.txt"

Also, in the solver file models/ucf101/tsn_bn_inception_rgb_solver.prototxt, on line 12 change

snapshot_prefix: "models/ucf101_split1_tsn_rgb_bn_inception"

to

snapshot_prefix: "models/ucf101_split2_tsn_rgb_bn_inception"

in order to distiguish the learned weights.

  • Change GPU number, in general, one can use any number of GPU to do the training. To use more or less GPU, one can change the N_GPU in scripts/train_tsn.sh. Important notice: when the GPU number is changed, the effective batchsize is also changed. It's better to always make sure the effective batchsize, which equals to batch_size*iter_size*n_gpu, to be 128. Here, batch_size is the number in the model's prototxt, for example line 9 in models/ucf101/tsn_bn_inception_rgb_train_val.protoxt.

Other Info

[back to top]

Citation

Please cite the following paper if you feel this repository useful.

@inproceedings{TSN2016ECCV,
  author    = {Limin Wang and
               Yuanjun Xiong and
               Zhe Wang and
               Yu Qiao and
               Dahua Lin and
               Xiaoou Tang and
               Luc {Val Gool}},
  title     = {Temporal Segment Networks: Towards Good Practices for Deep Action Recognition},
  booktitle   = {ECCV},
  year      = {2016},
}

Related Projects

Contact

For any question, please contact

Yuanjun Xiong: [email protected]
Limin Wang: [email protected]
Owner
Young and simple. [email protected] -> Amazon Rekognition. We are hiring summer interns for 20
Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Real Cascade U-Nets for Anime Image Super Resolution 中文 | English 🔥 Real-CUGAN

tarsin 111 Dec 28, 2022
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [pdf] The official repository for Self-Supervised Pre-Training for Transfo

Hao Luo 116 Jan 04, 2023
PyTorch wrapper for Taichi data-oriented class

Stannum PyTorch wrapper for Taichi data-oriented class PRs are welcomed, please see TODOs. Usage from stannum import Tin import torch data_oriented =

86 Dec 23, 2022
Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing da

MIT CSAIL Computer Vision 4.5k Jan 08, 2023
PyTorch implementations of Top-N recommendation, collaborative filtering recommenders.

PyTorch implementations of Top-N recommendation, collaborative filtering recommenders.

Yoonki Jeong 129 Dec 22, 2022
Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project

Semantic Code Search Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project. The model

Chen Wu 24 Nov 29, 2022
A set of tools to pre-calibrate and calibrate (multi-focus) plenoptic cameras (e.g., a Raytrix R12) based on the libpleno.

COMPOTE: Calibration Of Multi-focus PlenOpTic camEra. COMPOTE is a set of tools to pre-calibrate and calibrate (multifocus) plenoptic cameras (e.g., a

ComSEE - Computers that SEE 4 May 10, 2022
Object detection and instance segmentation toolkit based on PaddlePaddle.

Object detection and instance segmentation toolkit based on PaddlePaddle.

9.3k Jan 02, 2023
Official implementation of the paper DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows

DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows Official implementation of the paper DeFlow: Learning Complex Im

Valentin Wolf 86 Nov 16, 2022
A Python wrapper for Google Tesseract

Python Tesseract Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and "read" the text embedded i

Matthias A Lee 4.6k Jan 05, 2023
This is a repo of basic Machine Learning!

Basic Machine Learning This repository contains a topic-wise curated list of Machine Learning and Deep Learning tutorials, articles and other resource

Ekram Asif 53 Dec 31, 2022
End-to-end beat and downbeat tracking in the time domain.

WaveBeat End-to-end beat and downbeat tracking in the time domain. | Paper | Code | Video | Slides | Setup First clone the repo. git clone https://git

Christian J. Steinmetz 60 Dec 24, 2022
PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

ALiBi PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. Quickstart Clone this reposit

Jake Tae 4 Jul 27, 2022
Incorporating Transformer and LSTM to Kalman Filter with EM algorithm

Deep learning based state estimation: incorporating Transformer and LSTM to Kalman Filter with EM algorithm Overview Kalman Filter requires the true p

zshicode 57 Dec 27, 2022
Sequential model-based optimization with a `scipy.optimize` interface

Scikit-Optimize Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements

Scikit-Optimize 2.5k Jan 04, 2023
A library for researching neural networks compression and acceleration methods.

A library for researching neural networks compression and acceleration methods.

Intel Labs 100 Dec 29, 2022
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

Holy Wu 35 Jan 01, 2023
A plug-and-play library for neural networks written in Python

A plug-and-play library for neural networks written in Python!

Dimos Michailidis 2 Jul 16, 2022
Bootstrapped Unsupervised Sentence Representation Learning (ACL 2021)

Install first pip3 install -e . Training python3 training/unsupervised_tuning.py python3 training/supervised_tuning.py python3 training/multilingual_

yanzhang_nlp 26 Jul 22, 2022
Physics-Informed Neural Networks (PINN) and Deep BSDE Solvers of Differential Equations for Scientific Machine Learning (SciML) accelerated simulation

NeuralPDE NeuralPDE.jl is a solver package which consists of neural network solvers for partial differential equations using scientific machine learni

SciML Open Source Scientific Machine Learning 680 Jan 02, 2023