Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Overview

Temporal Segment Networks (TSN)

We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation for TSN as well as other STOA frameworks for various tasks. We highly recommend you switch to it. This repo will keep on being suppported for Caffe users.

This repository holds the codes and models for the papers

Temporal Segment Networks for Action Recognition in Videos, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, TPAMI, 2018.

[Arxiv Preprint]

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, ECCV 2016, Amsterdam, Netherlands.

[Arxiv Preprint]

News & Updates

Jul. 20, 2018 - For those having trouble building the TSN toolkit, we have provided a built docker image you can use. Download it from DockerHub. It contains OpenCV, Caffe, DenseFlow, and this codebase. All built and ready to use with NVIDIA-Docker

Sep. 8, 2017 - We released TSN models trained on the Kinetics dataset with 76.6% single model top-1 accuracy. Find the model weights and transfer learning experiment results on the website.

Aug 10, 2017 - An experimental pytorch implementation of TSN is released github

Nov. 5, 2016 - The project page for TSN is online. website

Sep. 14, 2016 - We fixed a legacy bug in Caffe. Some parameters in TSN training are affected. You are advised to update to the latest version.

FAQ, How to add a custom dataset

Below is the guidance to reproduce the reported results and explore more.

Contents


Usage Guide

Prerequisites

[back to top]

There are a few dependencies to run the code. The major libraries we use are

The codebase is written in Python. We recommend the Anaconda Python distribution. Matlab scripts are provided for some critical steps like video-level testing.

The most straightforward method to install these libraries is to run the build-all.sh script.

Besides software, GPU(s) are required for optical flow extraction and model training. Our Caffe modification supports highly efficient parallel training. Just throw in as many GPUs as you like and enjoy.

Code & Data Preparation

Get the code

[back to top]

Use git to clone this repository and its submodules

git clone --recursive https://github.com/yjxiong/temporal-segment-networks

Then run the building scripts to build the libraries.

bash build_all.sh

It will build Caffe and dense_flow. Since we need OpenCV to have Video IO, which is absent in most default installations, it will also download and build a local installation of OpenCV and use its Python interfaces.

Note that to run training with multiple GPUs, one needs to enable MPI support of Caffe. To do this, run

MPI_PREFIX=<root path to openmpi installation> bash build_all.sh MPI_ON

Get the videos

[back to top]

We experimented on two mainstream action recognition datasets: UCF-101 and HMDB51. Videos can be downloaded directly from their websites. After download, please extract the videos from the rar archives.

  • UCF101: the ucf101 videos are archived in the downloaded file. Please use unrar x UCF101.rar to extract the videos.
  • HMDB51: the HMDB51 video archive has two-level of packaging. The following commands illustrate how to extract the videos.
mkdir rars && mkdir videos
unrar x hmdb51-org.rar rars/
for a in $(ls rars); do unrar x "rars/${a}" videos/; done;

Get trained models

[back to top]

We provided the trained model weights in Caffe style, consisting of specifications in Protobuf messages, and model weights. In the codebase we provide the model spec for UCF101 and HMDB51. The model weights can be downloaded by running the script

bash scripts/get_reference_models.sh

Extract Frames and Optical Flow Images

[back to top]

To run the training and testing, we need to decompose the video into frames. Also the temporal stream networks need optical flow or warped optical flow images for input.

These can be achieved with the script scripts/extract_optical_flow.sh. The script has three arguments

  • SRC_FOLDER points to the folder where you put the video dataset
  • OUT_FOLDER points to the root folder where the extracted frames and optical images will be put in
  • NUM_WORKER specifies the number of GPU to use in parallel for flow extraction, must be larger than 1

The command for running optical flow extraction is as follows

bash scripts/extract_optical_flow.sh SRC_FOLDER OUT_FOLDER NUM_WORKER

It will take from several hours to several days to extract optical flows for the whole datasets, depending on the number of GPUs.

Testing Provided Models

Get reference models

[back to top]

To help reproduce the results reported in the paper, we provide reference models trained by us for instant testing. Please use the following command to get the reference models.

bash scripts/get_reference_models.sh

Video-level testing

[back to top]

We provide a Python framework to run the testing. For the benchmark datasets, we will measure average accuracy on the testing splits. We also provide the facility to analyze a single video.

Generally, to test on the benchmark dataset, we can use the scripts eval_net.py and eval_scores.py.

For example, to test the reference rgb stream model on split 1 of ucf 101 with 4 GPUs, run

python tools/eval_net.py ucf101 1 rgb FRAME_PATH \
 models/ucf101/tsn_bn_inception_rgb_deploy.prototxt models/ucf101_split_1_tsn_rgb_reference_bn_inception.caffemodel \
 --num_worker 4 --save_scores SCORE_FILE

where FRAME_PATH is the path you extracted the frames of UCF-101 to and SCORE_FILE is the filename to store the extracted scores.

One can also use cached score files to evaluate the performance. To do this, issue the following command

python tools/eval_scores.py SCORE_FILE

The more important function of eval_scores.py is to do modality fusion. For example, once we got the scores of rgb stream in RGB_SCORE_FILE and flow stream in FLOW_SCORE_FILE. The fusion result with weights of 1:1.5 can be achieved with

python tools/eval_scores.py RGB_SCORE_FILE FLOW_SCORE_FILE --score_weights 1 1.5

To view the full help message of these scripts, run python eval_net.py -h or python eval_scores.py -h.

Training Temporal Segment Networks

[back to top]

Training TSN is straightforward. We have provided the necessary model specs, solver configs, and initialization models. To achieve optimal training speed, we strongly advise you to turn on the parallel training support in the Caffe toolbox using following build command

MPI_PREFIX=<root path to openmpi installation> bash build_all.sh MPI_ON

where root path to openmpi installation points to the installation of the OpenMPI, for example /usr/local/openmpi/.

Construct file lists for training and validation

[back to top]

The data feeding in training relies on VideoDataLayer in Caffe. This layer uses a list file to specify its data sources. Each line of the list file will contain a tuple of extracted video frame path, video frame number, and video groundtruth class. A list file looks like

video_frame_path 100 10
video_2_frame_path 150 31
...

To build the file lists for all 3 splits of the two benchmark dataset, we have provided a script. Just use the following command

bash scripts/build_file_list.sh ucf101 FRAME_PATH

and

bash scripts/build_file_list.sh hmdb51 FRAME_PATH

The generated list files will be put in data/ with names like ucf101_flow_val_split_2.txt.

Get initialization models

[back to top]

We have built the initialization model weights for both rgb and flow input. The flow initialization models implements the cross-modality training technique in the paper. To download the model weights, run

bash scripts/get_init_models.sh

Start training

[back to top]

Once all necessities ready, we can start training TSN. For this, use the script scripts/train_tsn.sh. For example, the following command runs training on UCF101 with rgb input

bash scripts/train_tsn.sh ucf101 rgb

the training will run with default settings on 4 GPUs. Usually, it takes around 1 hours to train the rgb model and 4 hours for flow models, on 4 GTX Titan X GPUs.

The learned model weights will be saved in models/. The aforementioned testing process can be used to evaluate them.

Config the training process

[back to top]

Here we provide some information on customizing the training process

  • Change split: By default, the training is conducted on split 1 of the datasets. To change the split, one can modify corresponding model specs and solver files. For example, to train on split 2 of UCF101 with rgb input, one can modify the file models/ucf101/tsn_bn_inception_rgb_train_val.prototxt. On line 8, change
source: "data/ucf101_rgb_train_split_1.txt"`

to

`source: "data/ucf101_rgb_train_split_2.txt"`

On line 34, change

source: "data/ucf101_rgb_val_split_1.txt"

to

source: "data/ucf101_rgb_val_split_2.txt"

Also, in the solver file models/ucf101/tsn_bn_inception_rgb_solver.prototxt, on line 12 change

snapshot_prefix: "models/ucf101_split1_tsn_rgb_bn_inception"

to

snapshot_prefix: "models/ucf101_split2_tsn_rgb_bn_inception"

in order to distiguish the learned weights.

  • Change GPU number, in general, one can use any number of GPU to do the training. To use more or less GPU, one can change the N_GPU in scripts/train_tsn.sh. Important notice: when the GPU number is changed, the effective batchsize is also changed. It's better to always make sure the effective batchsize, which equals to batch_size*iter_size*n_gpu, to be 128. Here, batch_size is the number in the model's prototxt, for example line 9 in models/ucf101/tsn_bn_inception_rgb_train_val.protoxt.

Other Info

[back to top]

Citation

Please cite the following paper if you feel this repository useful.

@inproceedings{TSN2016ECCV,
  author    = {Limin Wang and
               Yuanjun Xiong and
               Zhe Wang and
               Yu Qiao and
               Dahua Lin and
               Xiaoou Tang and
               Luc {Val Gool}},
  title     = {Temporal Segment Networks: Towards Good Practices for Deep Action Recognition},
  booktitle   = {ECCV},
  year      = {2016},
}

Related Projects

Contact

For any question, please contact

Yuanjun Xiong: [email protected]
Limin Wang: [email protected]
Owner
Young and simple. [email protected] -> Amazon Rekognition. We are hiring summer interns for 20
Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

Neural Circuit Policies Enabling Auditable Autonomy Online access via SharedIt Neural Circuit Policies (NCPs) are designed sparse recurrent neural net

8 Jan 07, 2023
A Keras implementation of YOLOv3 (Tensorflow backend)

keras-yolo3 Introduction A Keras implementation of YOLOv3 (Tensorflow backend) inspired by allanzelener/YAD2K. Quick Start Download YOLOv3 weights fro

7.1k Jan 03, 2023
ROS support for Velodyne 3D LIDARs

Overview Velodyne1 is a collection of ROS2 packages supporting Velodyne high definition 3D LIDARs3. Warning: The master branch normally contains code

ROS device drivers 543 Dec 30, 2022
A different spin on dataclasses.

dataklasses Dataklasses is a library that allows you to quickly define data classes using Python type hints. Here's an example of how you use it: from

David Beazley 752 Nov 18, 2022
The implementation of "Bootstrapping Semantic Segmentation with Regional Contrast".

ReCo - Regional Contrast This repository contains the source code of ReCo and baselines from the paper, Bootstrapping Semantic Segmentation with Regio

Shikun Liu 128 Dec 30, 2022
Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network

Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network The performances of tree ensemb

Mustapha Unubi Momoh 2 Sep 13, 2022
Trained on Simulated Data, Tested in the Real World

Trained on Simulated Data, Tested in the Real World

livox 43 Nov 18, 2022
A simple version for graphfpn

GraphFPN: Graph Feature Pyramid Network for Object Detection Download graph-FPN-main.zip For training , run: python train.py For test with Graph_fpn

WorldGame 67 Dec 25, 2022
Point detection through multi-instance deep heatmap regression for sutures in endoscopy

Suture detection PyTorch This repo contains the reference implementation of suture detection model in PyTorch for the paper Point detection through mu

artificial intelligence in the area of cardiovascular healthcare 3 Jul 16, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 08, 2023
FedGS: A Federated Group Synchronization Framework Implemented by LEAF-MX.

FedGS: Data Heterogeneity-Robust Federated Learning via Group Client Selection in Industrial IoT Preparation For instructions on generating data, plea

Lizonghang 9 Dec 22, 2022
Fully Automatic Page Turning on Real Scores

Fully Automatic Page Turning on Real Scores This repository contains the corresponding code for our extended abstract Henkel F., Schwaiger S. and Widm

Florian Henkel 7 Jan 02, 2022
COVID-Net Open Source Initiative

The COVID-Net models provided here are intended to be used as reference models that can be built upon and enhanced as new data becomes available

Linda Wang 1.1k Dec 26, 2022
High-resolution networks and Segmentation Transformer for Semantic Segmentation

High-resolution networks and Segmentation Transformer for Semantic Segmentation Branches This is the implementation for HRNet + OCR. The PyTroch 1.1 v

HRNet 2.8k Jan 07, 2023
Weakly Supervised Learning of Rigid 3D Scene Flow

Weakly Supervised Learning of Rigid 3D Scene Flow This repository provides code and data to train and evaluate a weakly supervised method for rigid 3D

Zan Gojcic 124 Dec 27, 2022
Denoising images with Fourier Ring Correlation loss

Denoising images with Fourier Ring Correlation loss The python code accompanies the working manuscript Image quality measurements and denoising using

2 Mar 12, 2022
Defending graph neural networks against adversarial attacks (NeurIPS 2020)

GNNGuard: Defending Graph Neural Networks against Adversarial Attacks Authors: Xiang Zhang ( Zitnik Lab @ Harvard 44 Dec 07, 2022

Open source repository for the code accompanying the paper 'PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations'.

PatchNets This is the official repository for the project "PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations". For details,

16 May 22, 2022
Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems

Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems This repository is the official implementation of Rever

6 Aug 25, 2022