Hand gesture recognition model that can be used as a remote control for a smart tv.

Last update: Aug 11, 2022

Related tags

Overview

Gesture_recognition

The training data consists of a few hundred videos categorised into one of the five classes. Each video (typically 2-3 seconds long) is divided into a sequence of 30 frames(images). These videos have been recorded by various people performing one of the five gestures in front of a webcam - similar to what the smart TV will use. Each gesture corresponds to a specific command:

Thumbs up: Increase the volume
Thumbs down: Decrease the volume
Left swipe: 'Jump' backwards 10 seconds
Right swipe: 'Jump' forward 10 seconds
Stop: Pause the movie

Each video is a sequence of 30 frames (or images).

https://www.kaggle.com/pratyushh/gesture-data

The data is in a zip file. The zip file contains a 'train' and a 'val' folder with two CSV files for the two folders. These folders are in turn divided into subfolders where each subfolder represents a video of a particular gesture. Each subfolder, i.e. a video, contains 30 frames (or images). Note that all images in a particular video subfolder have the same dimensions but different videos may have different dimensions. Specifically, videos have two types of dimensions - either 360x360 or 120x160 (depending on the webcam used to record the videos).

Each row of the CSV file represents one video and contains three main pieces of information - the name of the subfolder containing the 30 images of the video, the name of the gesture and the numeric label (between 0-4) of the video.

For analysing videos using neural networks, two types of architectures are used commonly. One is the standard CNN + RNN architecture in which you pass the images of a video through a CNN which extracts a feature vector for each image, and then pass the sequence of these feature vectors through an RNN.

The other popular architecture used to process videos is a natural extension of CNNs - a 3D convolutional network.

Convolutions + RNN

The conv2D network will extract a feature vector for each image, and a sequence of these feature vectors is then fed to an RNN-based network. The output of the RNN is a regular softmax (for a classification problem such as this one).

3D Convolutional Network, or Conv3D

3D convolutions are a natural extension to the 2D convolutions you are already familiar with. Just like in 2D conv, you move the filter in two directions (x and y), in 3D conv, you move the filter in three directions (x, y and z). In this case, the input to a 3D conv is a video (which is a sequence of 30 RGB images).

Hand gesture recognition model that can be used as a remote control for a smart tv.

Related tags

Overview

Gesture_recognition

Convolutions + RNN

3D Convolutional Network, or Conv3D

Owner

Pratyush Negi

Advanced yabai wooting scripts

A solution to the 2D Ising model of ferromagnetism, implemented using the Metropolis algorithm

MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;

Source code for Transformer-based Multi-task Learning for Disaster Tweet Categorisation (UCD's participation in TREC-IS 2020A, 2020B and 2021A).

Benchmarking the robustness of Spatial-Temporal Models

Pytorch implement of 'Unmixing based PAN guided fusion network for hyperspectral imagery'

THIS IS THE OLD PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

Code samples for my book "Neural Networks and Deep Learning"

Retrieve and analysis data from SDSS (Sloan Digital Sky Survey)

PyTorch implementation of Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction (ICCV 2021).

NALSM: Neuron-Astrocyte Liquid State Machine

The official repository for "Score Transformer: Generating Musical Scores from Note-level Representation" (MMAsia '21)

The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

GoodNews Everyone! Context driven entity aware captioning for news images

A python library to artfully visualize Factorio Blueprints and an interactive web demo for using it.

Tensorflow2.0 🍎🍊 is delicious, just eat it! 😋😋

Optimizes image files by converting them to webp while also updating all references.

Hand gesture recognition model that can be used as a remote control for a smart tv.

Related tags

Overview

Gesture_recognition

Convolutions + RNN

3D Convolutional Network, or Conv3D

Owner

Pratyush Negi

Advanced yabai wooting scripts

A solution to the 2D Ising model of ferromagnetism, implemented using the Metropolis algorithm

MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;

Source code for Transformer-based Multi-task Learning for Disaster Tweet Categorisation (UCD's participation in TREC-IS 2020A, 2020B and 2021A).

Benchmarking the robustness of Spatial-Temporal Models

Pytorch implement of 'Unmixing based PAN guided fusion network for hyperspectral imagery'

THIS IS THE **OLD** PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

Code samples for my book "Neural Networks and Deep Learning"

Retrieve and analysis data from SDSS (Sloan Digital Sky Survey)

PyTorch implementation of Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction (ICCV 2021).

NALSM: Neuron-Astrocyte Liquid State Machine

The official repository for "Score Transformer: Generating Musical Scores from Note-level Representation" (MMAsia '21)

The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

GoodNews Everyone! Context driven entity aware captioning for news images

A python library to artfully visualize Factorio Blueprints and an interactive web demo for using it.

Tensorflow2.0 🍎🍊 is delicious, just eat it! 😋😋

Optimizes image files by converting them to webp while also updating all references.

THIS IS THE OLD PYMC PROJECT. PLEASE USE PYMC3 INSTEAD: