The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Overview

Temporal Query Networks for Fine-grained Video Understanding

📋 This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks for Fine-grained Video Understanding

Abstract

Our objective in this work is fine-grained classification of actions in untrimmed videos, where the actions may be temporally extended or may span only a few frames of the video. We cast this into a query-response mechanism, where each query addresses a particular question, and has its own response label set.

We make the following four contributions: (i) We propose a new model — a Temporal Query Network — which enables the query-response functionality, and a structural undertanding of fine-grained actions. It attends to relevant segments for each query with a temporal attention mechanism, and can be trained using only the labels for each query. (ii) We propose a new way — stochastic feature bank update — to train a network on videos of various lengths with the dense sampling required to respond to fine-grained queries. (iii) we compare the TQN to other architectures and text supervision methods, and analyze their pros and cons. Finally, (iv) we evaluate the method extensively on the FineGym and Diving48 benchmarks for fine-grained action classification and surpass the state-of-the-art using only RGB features.

Getting Started

  1. Clone this repository
git clone https://github.com/Chuhanxx/Temporal_Query_Networks.git
  1. Create conda virtual env and install the requirements
    (This implementation requires CUDA and python > 3.7)
cd Temporal_Query_Networks
source build_venv.sh

Prepare Data and Weight Initialization

Please refer to data.md for data preparation.

Training

you can start training the model with the following steps, taking the Diving48 dataset as an example,:

  1. First stage training: Set the paths in the Diving48_first_stage.yaml config file first, and then run:
cd scripts
python train_1st_stage.py --name $EXP_NAME --dataset diving48 --dataset_config ../configs/Diving48_first_stage.yaml --gpus 0,1 --batch_size 16  
  1. Construct stochastically updated feature banks:
python construct_SUFB.py --dataset diving48 --dataset_config ../configs/Diving48_first_stage.yaml \
--gpus 0  --resume_file  $PATH_TO_BEST_FILE_FROM_1ST_STAGE --out_dir $DIR_FOR_SAVING_FEATURES 
  1. Second stage training: Set the paths in the Diving48_second_stage.yaml config file first, and then run:
python train_2nd_stage.py --name $EXP_NAME  --dataset diving48  \
--dataset_config ../configs/Diving48_second_stage.yaml   \
--batch_size 16 --gpus 0,1

Test

python test.py --name $EXP_NAME  --dataset diving48 --batch_size 1 \
--dataset_config ../configs/Diving48_second_stage.yaml 

Citation

If you use this code etc., please cite the following paper:

@inproceedings{zhangtqn,
  title={Temporal Query Networks for Fine-grained Video Understanding},
  author={Chuhan Zhang and Ankush Gputa and Andrew Zisserman},
  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

If you have any question, please contact [email protected] .

Github Traffic Insights as Prometheus metrics.

github-traffic Github Traffic collects your repository's traffic data and exposes it as Prometheus metrics. Grafana dashboard that displays the metric

Grafana Labs 34 Oct 27, 2022
Out-of-boundary View Synthesis towards Full-frame Video Stabilization

Out-of-boundary View Synthesis towards Full-frame Video Stabilization Introduction | Update | Results Demo | Introduction This repository contains the

25 Oct 10, 2022
Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

NuPIC Numenta Platform for Intelligent Computing The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implem

Numenta 6.3k Dec 30, 2022
This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

Differentiable Volumetric Rendering Paper | Supplementary | Spotlight Video | Blog Entry | Presentation | Interactive Slides | Project Page This repos

697 Jan 06, 2023
A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Segnet is deep fully convolutional neural network architecture for semantic pixel-wise segmentation. This is implementation of http://arxiv.org/pdf/15

Pradyumna Reddy Chinthala 190 Dec 15, 2022
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

101 Nov 25, 2022
Expand human face editing via Global Direction of StyleCLIP, especially to maintain similarity during editing.

Oh-My-Face This project is based on StyleCLIP, RIFE, and encoder4editing, which aims to expand human face editing via Global Direction of StyleCLIP, e

AiLin Huang 51 Nov 17, 2022
Various operations like path tracking, counting, etc by using yolov5

Object-tracing-with-YOLOv5 Various operations like path tracking, counting, etc by using yolov5

Pawan Valluri 5 Nov 28, 2022
Thermal Control of Laser Powder Bed Fusion using Deep Reinforcement Learning

This repository is the implementation of the paper "Thermal Control of Laser Powder Bed Fusion Using Deep Reinforcement Learning", linked here. The project makes use of the Deep Reinforcement Library

BaratiLab 11 Dec 27, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022
Trax — Deep Learning with Clear Code and Speed

Trax — Deep Learning with Clear Code and Speed Trax is an end-to-end library for deep learning that focuses on clear code and speed. It is actively us

Google 7.3k Dec 26, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and i

yifan liu 147 Dec 03, 2022
The 3rd place solution for competition

The 3rd place solution for competition "Lyft Motion Prediction for Autonomous Vehicles" at Kaggle Team behind this solution: Artsiom Sanakoyeu [Homepa

Artsiom 104 Nov 22, 2022
Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

Storage Optimizer Identify potintial optimizations on the cloud storage accounts

Zaher Mousa 1 Feb 13, 2022
Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency

Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency This is a official implementation of the CycleContrast introduced in

13 Nov 14, 2022
[WACV 2022] Contextual Gradient Scaling for Few-Shot Learning

CxGrad - Official PyTorch Implementation Contextual Gradient Scaling for Few-Shot Learning Sanghyuk Lee, Seunghyun Lee, and Byung Cheol Song In WACV 2

Sanghyuk Lee 4 Dec 05, 2022
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 05, 2022
Code repo for "Transformer on a Diet" paper

Transformer on a Diet Reference: C Wang, Z Ye, A Zhang, Z Zhang, A Smola. "Transformer on a Diet". arXiv preprint arXiv (2020). Installation pip insta

cgraywang 31 Sep 26, 2021
Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

Cheng Zhang 149 Jan 08, 2023
This repository contains an implementation of the Permutohedral Attention Module in Pytorch

Permutohedral_attention_module This repository contains an implementation of the Permutohedral Attention Module

Samuel JOUTARD 26 Nov 27, 2022