Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

Overview

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

This is an official implementation in PyTorch of AFSD. Our paper is available at https://arxiv.org/abs/2103.13137

Updates

  • (May, 2021) We released AFSD training and inference code for THUMOS14 dataset.
  • (February, 2021) AFSD is accepted by CVPR2021.

Abstract

Temporal action localization is an important yet challenging task in video understanding. Typically, such a task aims at inferring both the action category and localization of the start and end frame for each action instance in a long, untrimmed video. While most current models achieve good results by using pre-defined anchors and numerous actionness, such methods could be bothered with both large number of outputs and heavy tuning of locations and sizes corresponding to different anchors. Instead, anchor-free methods is lighter, getting rid of redundant hyper-parameters, but gains few attention. In this paper, we propose the first purely anchor-free temporal localization method, which is both efficient and effective. Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module to gather more valuable boundary features for each proposal with a novel boundary pooling, and (iii) several consistency constraints to make sure our model can find the accurate boundary given arbitrary proposals. Extensive experiments show that our method beats all anchor-based and actionness-guided methods with a remarkable margin on THUMOS14, achieving state-of-the-art results, and comparable ones on ActivityNet v1.3.

Summary

  • First purely anchor-free framework for temporal action detection task.
  • Fully end-to-end method using frames as input rather then features.
  • Saliency-based refinement module to gather more valuable boundary features.
  • Boundary consistency learning to make sure our model can find the accurate boundary.

Performance

Getting Started

Environment

  • Python 3.7
  • PyTorch == 1.4.0 (Please make sure your pytorch version is 1.4)
  • NVIDIA GPU

Setup

pip3 install -r requirements.txt
python3 setup.py develop

Data Preparation

  • THUMOS14 RGB data:
  1. Download post-processed RGB npy data (13.7GB): [Weiyun]
  2. Unzip the RGB npy data to ./datasets/thumos14/validation_npy/ and ./datasets/thumos14/test_npy/
  • THUMOS14 flow data:
  1. Because it costs more time to generate flow data for THUMOS14, to make easy to run flow model, we provide the post-processed flow data in Google Drive and Weiyun (3.4GB): [Google Drive], [Weiyun]
  2. Unzip the flow npy data to ./datasets/thumos14/validation_flow_npy/ and ./datasets/thumos14/test_flow_npy/

If you want to generate npy data by yourself, please refer to the following guidelines:

  • RGB data generation manually:
  1. To construct THUMOS14 RGB npy inputs, please download the THUMOS14 training and testing videos.
    Training videos: https://storage.googleapis.com/thumos14_files/TH14_validation_set_mp4.zip
    Testing videos: https://storage.googleapis.com/thumos14_files/TH14_Test_set_mp4.zip
    (unzip password is THUMOS14_REGISTERED)
  2. Move the training videos to ./datasets/thumos14/validation/ and the testing videos to ./datasets/thumos14/test/
  3. Run the data processing script: python3 AFSD/common/video2npy.py
  • Flow data generation manually:
  1. If you should generate flow data manually, firstly install the denseflow.
  2. Prepare the post-processed RGB data.
  3. Check and run the script: python3 AFSD/common/gen_denseflow_npy.py

Inference

We provide the pretrained models contain I3D backbone model and final RGB and flow models for THUMOS14 dataset: [Google Drive], [Weiyun]

# run RGB model
python3 AFSD/thumos14/test.py configs/thumos14.yaml --checkpoint_path=models/thumos14/checkpoint-15.ckpt --output_json=thumos14_rgb.json

# run flow model
python3 AFSD/thumos14/test.py configs/thumos14_flow.yaml --checkpoint_path=models/thumos14_flow/checkpoint-16.ckpt --output_json=thumos14_flow.json

# run fusion (RGB + flow) model
python3 AFSD/thumos14/test.py configs/thumos14.yaml --fusion --output_json=thumos14_fusion.json

Evaluation

The output json results of pretrained model can be downloaded from: [Google Drive], [Weiyun]

# evaluate THUMOS14 fusion result as example
python3 eval.py output/thumos14_fusion.json

mAP at tIoU 0.3 is 0.6728296149479254
mAP at tIoU 0.4 is 0.6242590551201842
mAP at tIoU 0.5 is 0.5546668739091394
mAP at tIoU 0.6 is 0.4374840824921885
mAP at tIoU 0.7 is 0.3110112542745055

Training

# train the RGB model
python3 AFSD/thumos14/train.py configs/thumos14.yaml --lw=10 --cw=1 --piou=0.5

# train the flow model
python3 AFSD/thumos14/train.py configs/thumos14_flow.yaml --lw=10 --cw=1 --piou=0.5

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{lin2021afsd,
  title={Learning Salient Boundary Feature for Anchor-free Temporal Action Localization},
  author={Chuming Lin*, Chengming Xu*, Donghao Luo, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Yanwei Fu},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}
Owner
Tencent YouTu Research
Tencent YouTu Research
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 03, 2023
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
This is a GUI program which consist of 4 OpenCV projects

Tkinter-OpenCV Project Using Tkinter, Opencv, Mediapipe This is a python GUI program using Tkinter which consist of 4 OpenCV projects 1. Finger Counte

Arya Bagde 3 Feb 22, 2022
Tesseract Open Source OCR Engine (main repository)

Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM

48.4k Jan 09, 2023
零样本学习测评基准,中文版

ZeroCLUE 零样本学习测评基准,中文版 零样本学习是AI识别方法之一。 简单来说就是识别从未见过的数据类别,即训练的分类器不仅仅能够识别出训练集中已有的数据类别, 还可以对于来自未见过的类别的数据进行区分。 这是一个很有用的功能,使得计算机能够具有知识迁移的能力,并无需任何训练数据, 很符合现

CLUE benchmark 27 Dec 10, 2022
A python program to block out your face

Readme This is a small program I threw together in about 6 hours to block out your face. It probably doesn't work very well, so be warned. By default,

1 Oct 17, 2021
Run tesseract with the tesserocr bindings with @OCR-D's interfaces

ocrd_tesserocr Crop, deskew, segment into regions / tables / lines / words, or recognize with tesserocr Introduction This package offers OCR-D complia

OCR-D 38 Oct 14, 2022
A list of hyperspectral image super-solution resources collected by Junjun Jiang

A list of hyperspectral image super-resolution resources collected by Junjun Jiang. If you find that important resources are not included, please feel free to contact me.

Junjun Jiang 301 Jan 05, 2023
Machine Leaning applied to denoise images to improve OCR Accuracy

Machine Learning to Denoise Images for Better OCR Accuracy This project is an adaptation of this tutorial and used only for learning purposes: https:/

Antonio Bri Pérez 2 Nov 16, 2022
Read-only mirror of https://gitlab.gnome.org/GNOME/ocrfeeder

================================= OCRFeeder - A Complete OCR Suite ================================= OCRFeeder is a complete Optical Character Recogn

GNOME Github Mirror 81 Dec 23, 2022
Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

Visual Behavior 86 Dec 28, 2022
Using python libraries to track hands

Python-HandTracking Using python libraries to track hands on a camera Uses cv2 and mediapipe libraries custom hand tracking module PyCharm IDE Final E

Martin Matsudaira 1 Dec 17, 2021
LEARN OPENCV IN 3 HOURS USING PYTHON - INCLUDING EXAMPLE PROJECTS

LEARN OPENCV IN 3 HOURS USING PYTHON - INCLUDING EXAMPLE PROJECTS

Murtaza Hassan 815 Dec 29, 2022
Slice a single image into multiple pieces and create a dataset from them

OpenCV Image to Dataset Converter Slice a single image of Persian digits into mu

Meysam Parvizi 14 Dec 29, 2022
textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

An End-to-End TextSpotter with Explicit Alignment and Attention This is initially described in our CVPR 2018 paper. Getting Started Installation Clone

Tong He 323 Nov 10, 2022
Comparison-of-OCR (KerasOCR, PyTesseract,EasyOCR)

Optical Character Recognition OCR (Optical Character Recognition) is a technology that enables the conversion of document types such as scanned paper

21 Dec 25, 2022
Automatically resolve RidderMaster based on TensorFlow & OpenCV

AutoRiddleMaster Automatically resolve RidderMaster based on TensorFlow & OpenCV 基于 TensorFlow 和 OpenCV 实现的全自动化解御迷士小马谜题 Demo How to use Deploy the ser

神龙章轩 5 Nov 19, 2021
A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

NOTES: To ensure best results, make sure you are running this on a computer that has decent specs. 1920x1080 fullscreen is required in League, game mu

francis 125 Dec 30, 2022
This repository contains codes on how to handle mouse event using OpenCV

Handling-Mouse-Click-Events-Using-OpenCV This repository contains codes on how t

Happy N. Monday 3 Feb 15, 2022