Goal of the project : Detecting Temporal Boundaries in Sign Language videos

Overview

MVA RecVis course final project :

Goal of the project : Detecting Temporal Boundaries in Sign Language videos.

Sign language automatic indexing is an important challenge to develop better communication tools for the deaf community. However, annotated datasets for sign langage are limited, and there are few people with skills to anotate such data, which makes it hard to train performant machine learning models. An important challenge is therefore to :

  • Increase available training datasets.
  • Make labeling easier for professionnals to reduce risks of bad annotations.

In this context, techniques have emerged to perform automatic sign segmentation in videos, by marking the boundaries between individual signs in sign language videos. The developpment of such tools offers the potential to alleviate the limited supply of labelled dataset currently available for sign research.

demo

Previous work and personal contribution :

This repository provides code for the Object Recognition & Computer Vision (RecVis) course Final project. For more details please refer the the project report report.pdf. In this project, we reproduced the results obtained on the following paper (by using the code from this repository) :

We used the pre-extracted frame-level features obtained by applying the I3D model on videos to retrain the MS-TCN architecture for frame-level binary classification and reproduce the papers results. The tests folder proposes a notebook for reproducing the original paper results, with a meanF1B = 68.68 on the evaluation set of the BSL Corpus.

We further implemented new models in order to improve this result. We wanted to try attention based models as they have received recently a huge gain of interest in the vision research community. We first tried to train a Vanilla Transformer Encoder from scratch, but the results were not satisfactory.

  • Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: (2018).

We then implemented the ASFormer model (Transformer for Action Segementation), using this code : a hybrid transformer model using some interesting ideas from the MS-TCN architecture. The motivations behind the model and its architecture are detailed in the following paper :

We trained this model on the I3D extracted features and obtained an improvement over the MS-TCN architecture. The results are given in the following table :

ID Model mF1B mF1S
1 MS-TCN 68.68±0.6 47.71±0.8
2 Transformer Encoder 60.28±0.3 42.70±0.2
3 ASFormer 69.79±0.2 49.23±1.2

Contents

Setup

# Clone this repository
git clone https://github.com/loubnabnl/Sign-Segmentation-with-Transformers.git
cd Sign-Segmentation-with-Transformers/
# Create signseg_env environment
conda env create -f environment.yml
conda activate signseg_env

Data and models

You can download the pretrained models (I3D and MS-TCN) (models.zip [302MB]) and data (data.zip [5.5GB]) used in the experiments here or by executing download/download_*.sh. The unzipped data/ and models/ folders should be located on the root directory of the repository (for using the demo downloading the models folder is sufficient).

You can download our best pretrained ASFormer model weights here.

Data:

Please cite the original datasets when using the data: BSL Corpus The authors of github.com/RenzKa/sign-segmentation provided the pre-extracted features and metadata. See here for a detailed description of the data files.

  • Features: data/features/*/*/features.mat
  • Metadata: data/info/*/info.pkl

Models:

  • I3D weights, trained for sign classification: models/i3d/*.pth.tar
  • MS-TCN weights for the demo (see tables below for links to the other models): models/ms-tcn/*.model
  • As_former weights of our best model : models/asformer/*.model

The folder structure should be as below:

sign-segmentation/models/
  i3d/
    i3d_kinetics_bslcp.pth.tar
  ms-tcn/
    mstcn_bslcp_i3d_bslcp.model
  asformer/
    best_asformer_bslcp.model

Demo

The demo folder contains a sample script to estimate the segments of a given sign language video, one can run demo.pyto get a visualization on a sample video.

cd demo
python demo.py

The demo will:

  1. use the models/i3d/i3d_kinetics_bslcp.pth.tar pretrained I3D model to extract features,
  2. use the models/asformer/best_asformer_model.model pretrained ASFormer model to predict the segments out of the features.
  3. save results.

Training

To train I3D please refer to github.com/RenzKa/sign-segmentation. To train ASFormer on the pre-extracted I3D features run main.py, you can change hyperparameters in the arguments inside the file. Or you can run the notebook in the folder test_asformer.

Citation

If you use this code and data, please cite the original papers following:

@inproceedings{Renz2021signsegmentation_a,
    author       = "Katrin Renz and Nicolaj C. Stache and Samuel Albanie and G{\"u}l Varol",
    title        = "Sign Language Segmentation with Temporal Convolutional Networks",
    booktitle    = "ICASSP",
    year         = "2021",
}
@article{yi2021asformer,
  title={Asformer: Transformer for action segmentation},
  author={Yi, Fangqiu and Wen, Hongyu and Jiang, Tingting},
  journal={arXiv preprint arXiv:2110.08568},
  year={2021}
}

License

The license in this repository only covers the code. For data.zip and models.zip we refer to the terms of conditions of original datasets.

Acknowledgements

The code builds on the github.com/RenzKa/sign-segmentation and github.com/ChinaYi/ASFormer repositories.

Owner
Loubna Ben Allal
MVA (Mathematics, Vision, Learning) student at ENS Paris Saclay.
Loubna Ben Allal
Py4fi2nd - Jupyter Notebooks and code for Python for Finance (2nd ed., O'Reilly) by Yves Hilpisch.

Python for Finance (2nd ed., O'Reilly) This repository provides all Python codes and Jupyter Notebooks of the book Python for Finance -- Mastering Dat

Yves Hilpisch 1k Jan 05, 2023
Blender Add-On for slicing meshes with planes

MeshSlicer Blender Add-On for slicing meshes with multiple overlapping planes at once. This is a simple Blender addon to slice a silmple mesh with mul

52 Dec 12, 2022
Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)

Meta-SparseINR Official PyTorch implementation of "Meta-learning Sparse Implicit Neural Representations" (NeurIPS 2021) by Jaeho Lee*, Jihoon Tack*, N

Jaeho Lee 41 Nov 10, 2022
End-To-End Memory Network using Tensorflow

MemN2N Implementation of End-To-End Memory Networks with sklearn-like interface using Tensorflow. Tasks are from the bAbl dataset. Get Started git clo

Dominique Luna 339 Oct 27, 2022
Official implementation of YOGO for Point-Cloud Processing

You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module By Chenfeng Xu, Bohan Zhai, Bichen Wu, T

Chenfeng Xu 67 Dec 20, 2022
Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Representation Robustness Evaluations Our implementation is based on code from MadryLab's robustness package and Devon Hjelm's Deep InfoMax. For all t

Sicheng 19 Dec 07, 2022
Official repository for ABC-GAN

ABC-GAN The work represented in this repository is the result of a 14 week semesterthesis on photo-realistic image generation using generative adversa

IgorSusmelj 10 Jun 23, 2022
A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

Lyndon Chan 48 Dec 18, 2022
Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

Video Corpus Moment Retrieval with Contrastive Learning PyTorch implementation for the paper "Video Corpus Moment Retrieval with Contrastive Learning"

ZHANG HAO 42 Dec 29, 2022
This project hosts the code for implementing the ISAL algorithm for object detection and image classification

Influence Selection for Active Learning (ISAL) This project hosts the code for implementing the ISAL algorithm for object detection and image classifi

25 Sep 11, 2022
An official implementation of "Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation" (CVPR 2021) in PyTorch.

BANA This is the implementation of the paper "Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation". For more inf

CV Lab @ Yonsei University 59 Dec 12, 2022
Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

DroneCrowd Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. Introduction This paper proposes a space-time multi-scale atte

VisDrone 98 Nov 16, 2022
The world's largest toxicity dataset.

The Toxicity Dataset by Surge AI Saving the internet is fun. Combing through thousands of online comments to build a toxicity dataset isn't. That's wh

Surge AI 134 Dec 19, 2022
The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

OC-SORT Observation-Centric SORT (OC-SORT) is a pure motion-model-based multi-object tracker. It aims to improve tracking robustness in crowded scenes

Jinkun Cao 325 Jan 05, 2023
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and ap

3.4k Jan 04, 2023
Steer OpenAI's Jukebox with Music Taggers

TagBox Steer OpenAI's Jukebox with Music Taggers! The closest thing we have to VQGAN+CLIP for music! Unsupervised Source Separation By Steering Pretra

Ethan Manilow 34 Nov 02, 2022
Rlmm blender toolkit - A set of tools to streamline level generation in UDK straight from Blender

rlmm_blender_toolkit A set of tools to streamline level generation in UDK straig

Rocket League Mapmaking 0 Jan 15, 2022
Deep Learning Package based on TensorFlow

White-Box-Layer is a Python module for deep learning built on top of TensorFlow and is distributed under the MIT license. The project was started in M

YeongHyeon Park 7 Dec 27, 2021
Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

SMPLicit: Topology-aware Generative Model for Clothed People [Project] [arXiv] License Software Copyright License for non-commercial scientific resear

Enric Corona 225 Dec 13, 2022
Keep CALM and Improve Visual Feature Attribution

Keep CALM and Improve Visual Feature Attribution Jae Myung Kim1*, Junsuk Choe1*, Zeynep Akata2, Seong Joon Oh1† * Equal contribution † Corresponding a

NAVER AI 90 Dec 07, 2022