This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Overview

Skeleton Aware Multi-modal Sign Language Recognition

By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu.

Smile Lab @ Northeastern University

Python 3.7 Packagist Last Commit License: CC0 4.0 PWC


This repo contains the official code of Skeleton Aware Multi-modal Sign Language Recognition (SAM-SLR) that ranked 1st in CVPR 2021 Challenge: Looking at People Large Scale Signer Independent Isolated Sign Language Recognition.

Our paper has been accepted to CVPR21 Workshop. A preprint version is available on arXiv. Please cite our paper if you find this repo useful in your research.

News

[2021/04/10] Our workshop paper has been accepted. Citation info updated.

[2021/03/24] A preprint version of our paper is released here.

[2021/03/20] Our work has been verified and announced by the organizers as the 1st place winner of the challenge!

[2021/03/15] The code is released to public on GitHub.

[2021/03/11] Our team (smilelab2021) ranked 1st in both tracks and here are the links to the leaderboards:

Table of Contents

Data Preparation

Download AUTSL Dataset.

We processed the dataset into six modalities in total: skeleton, skeleton features, rgb frames, flow color, hha and flow depth.

  1. Please put original train, val, test videos in data folder as
    data
    ├── train
    │   ├── signer0_sample1_color.mp4
    │   ├── signer0_sample1_depth.mp4
    │   ├── signer0_sample2_color.mp4
    │   ├── signer0_sample2_depth.mp4
    │   └── ...
    ├── val
    │   └── ...
    └── test
        └── ...
  1. Follow the data_processs/readme.md to process the data.

  2. Use TPose/data_process to extract wholebody pose features.

Requirements and Docker Image

The code is written using Anaconda Python >= 3.6 and Pytorch 1.7 with OpenCV.

Detailed enviroment requirment can be found in requirement.txt in each code folder.

For convenience, we provide a Nvidia docker image to run our code.

Download Docker Image

Pretrained Models

We provide pretrained models for all modalities to reproduce our submitted results. Please download them at and put them into corresponding folders.

Download Pretrained Models

Usage

Reproducing the Results Submitted to CVPR21 Challenge

To test our pretrained model, please put them under each code folders and run the test code as instructed below. To ensemble the tested results and reproduce our final submission. Please copy all the results .pkl files to ensemble/ and follow the instruction to ensemble our final outputs.

For a step-by-step instruction, please see reproduce.md.

Skeleton Keypoints

Skeleton modality can be trained, finetuned and tested using the code in SL-GCN/ folder. Please follow the SL-GCN/readme.md instruction to prepare skeleton data into four streams (joint, bone, joint_motion, bone motion).

Basic usage:

python main.py --config /path/to/config/file

To train, finetune and test our models, please change the config path to corresponding config files. Detailed instruction can be found in SL-GCN/readme.md

Skeleton Feature

For the skeleton feature, we propose a Separable Spatial-Temporal Convolution Network (SSTCN) to capture spatio-temporal information from those features.

Please follow the instruction in SSTCN/readme.txt to prepare the data, train and test the model.

RGB Frames

The RGB frames modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_clip.py

python Sign_Isolated_Conv3D_clip_finetune.py

python Sign_Isolated_Conv3D_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Optical Flow

The RGB optical flow modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_flow_clip.py

python Sign_Isolated_Conv3D_flow_clip_funtine.py

python Sign_Isolated_Conv3D_flow_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Depth HHA

The Depth HHA modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_hha_clip_mask.py

python Sign_Isolated_Conv3D_hha_clip_mask_finetune.py

python Sign_Isolated_Conv3D_hha_clip_mask_test.py

Detailed instruction can be found in Conv3D/readme.md

Depth Flow

The Depth Flow modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_depth_flow_clip.py

python Sign_Isolated_Conv3D_depth_flow_clip_finetune.py

python Sign_Isolated_Conv3D_depth_flow_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Model Ensemble

For both RGB and RGBD track, the tested results of all modalities need to be ensemble together to generate the final results.

  1. For RGB track, we use the results from skeleton, skeleton feature, rgb, and flow color modalities to ensemble the final results.

    a. Test the model using newly trained weights or provided pretrained weights.

    b. Copy all the test results to ensemble folder and rename them as their modality names.

    c. Ensemble SL-GCN results from joint, bone, joint motion, bone motion streams in gcn/ .

     python ensemble_wo_val.py; python ensemble_finetune.py
    

    c. Copy test_gcn_w_val_finetune.pkl to ensemble/. Copy RGB, TPose and optical flow results to ensemble/. Ensemble final prediction.

     python ensemble_multimodal_rgb.py
    

    Final predictions are saved in predictions.csv

  2. For RGBD track, we use the results from skeleton, skeleton feature, rgb, flow color, hha and flow depth modalities to ensemble the final results. a. copy hha and flow depth modalities to ensemble/ folder, then

     python ensemble_multimodal_rgb.py
    

To reproduce our results in CVPR21Challenge, we provide .pkl files to ensemble and obtain our final submitted predictions. Detailed instruction can be find in ensemble/readme.md

License

Licensed under the Creative Commons Zero v1.0 Universal license with the following exceptions:

  • The code is released for academic research use only. Commercial use is prohibited.
  • Published versions (changed or unchanged) must include a reference to the origin of the code.

Citation

If you find this project useful in your research, please cite our paper

@inproceedings{jiang2021skeleton,
  title={Skeleton Aware Multi-modal Sign Language Recognition},
  author={Jiang, Songyao and Sun, Bin and Wang, Lichen and Bai, Yue and Li, Kunpeng and Fu, Yun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year={2021}
}

@article{jiang2021skeleton,
  title={Skeleton Aware Multi-modal Sign Language Recognition},
  author={Jiang, Songyao and Sun, Bin and Wang, Lichen and Bai, Yue and Li, Kunpeng and Fu, Yun},
  journal={arXiv preprint arXiv:2103.08833},
  year={2021}
}

Reference

https://github.com/Sun1992/SSTCN-for-SLR

https://github.com/jin-s13/COCO-WholeBody

https://github.com/open-mmlab/mmpose

https://github.com/0aqz0/SLR

https://github.com/kchengiva/DecoupleGCN-DropGraph

https://github.com/HRNet/HRNet-Human-Pose-Estimation

https://github.com/charlesCXK/Depth2HHA

Owner
Isen (Songyao Jiang)
Isen (Songyao Jiang)
Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification"

Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification" This is an end-to-end framework for accurate and robust left ventr

2 Jul 09, 2022
MoCap-Solver: A Neural Solver for Optical Motion Capture Data

MoCap-Solver is a data-driven-based robust marker denoising method, which takes raw mocap markers as input and outputs corresponding clean markers and skeleton motions.

55 Dec 28, 2022
🏖 Keras Implementation of Painting outside the box

Keras implementation of Image OutPainting This is an implementation of Painting Outside the Box: Image Outpainting paper from Standford University. So

Bendang 1.1k Dec 10, 2022
PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Reference-Based-Sketch-Image-Colorization-ImageNet This is a PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization usin

Yuzhi ZHAO 11 Jul 28, 2022
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022
Course about deep learning for computer vision and graphics co-developed by YSDA and Skoltech.

Deep Vision and Graphics This repo supplements course "Deep Vision and Graphics" taught at YSDA @fall'21. The course is the successor of "Deep Learnin

Yandex School of Data Analysis 160 Jan 02, 2023
FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

FrankMocap pursues an easy-to-use single view 3D motion capture system developed by Facebook AI Research (FAIR). FrankMocap provides state-of-the-art 3D pose estimation outputs for body, hand, and bo

Facebook Research 1.9k Jan 07, 2023
Existing Literature about Machine Unlearning

Machine Unlearning Papers 2021 Brophy and Lowd. Machine Unlearning for Random Forests. In ICML 2021. Bourtoule et al. Machine Unlearning. In IEEE Symp

Jonathan Brophy 213 Jan 08, 2023
Code for CVPR2021 paper 'Where and What? Examining Interpretable Disentangled Representations'.

PS-SC GAN This repository contains the main code for training a PS-SC GAN (a GAN implemented with the Perceptual Simplicity and Spatial Constriction c

Xinqi/Steven Zhu 40 Dec 16, 2022
The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting

About The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting The demo program was only tested under Conda in a standard

Anh-Dzung Doan 5 Nov 28, 2022
Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

PyTorch RL Minimal Implementations There are implementations of some reinforcement learning algorithms, whose characteristics are as follow: Less pack

Gemini Light 4 Dec 31, 2022
The code release of paper Low-Light Image Enhancement with Normalizing Flow

[AAAI 2022] Low-Light Image Enhancement with Normalizing Flow Paper | Project Page Low-Light Image Enhancement with Normalizing Flow Yufei Wang, Renji

Yufei Wang 176 Jan 06, 2023
Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation of the NeurIPS 2021 paper Alias-Free Generative Adversarial Net

Diego Porres 185 Dec 24, 2022
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

Adelaide Intelligent Machines (AIM) Group 3k Jan 02, 2023
[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [PDF] Wuyang Chen, Xinyu Gong, Zhangyang Wang In ICLR 2

VITA 156 Nov 28, 2022
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

Daniil Pakhomov 134 Dec 19, 2022
Decision Transformer: A brand new Offline RL Pattern

DecisionTransformer_StepbyStep Intro Decision Transformer: A brand new Offline RL Pattern. 这是关于NeurIPS 2021 热门论文Decision Transformer的复现。 👍 原文地址: Deci

Irving 14 Nov 22, 2022
Cave Generation using metaballs in Blender. Originally created by sdfgeoff, Edited by Myself (Archie Jaskowicz).

Blender-Cave-Generation Cave Generation using metaballs in Blender. Originally created by sdfgeoff, Edited by Myself (Archie Jaskowicz). Installation

2 Dec 28, 2022
Model of an AI powered sign language interpreter.

TEXT AND SPEECH TO SIGN LANGUAGE. A web application which takes in text or live audio speech recording as input, converts and displays the relevant Si

Mark Gatere 4 Mar 30, 2022
This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay This is the official implementation of our paper "Diversity-based Traje

Tianhong Dai 6 Jul 18, 2022