Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Overview

Sign Language Recognition Service

This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition. The service was developed as a part of a bachelor project at Aalborg University.

alt text

Requirements

  • Python 3.7
  • OpenPose 1.6.0
  • CUDA 10.0
  • cuDNN 7.5.0
  • Numpy 1.18.5
  • OpenCV 4.5.1.48
  • Flask 1.1.2
  • Tensorflow 2.0.0
  • Pandas 1.1.5
  • Tensorboard
  • Matplotlib
  • Seaborn
  • Scikit-Learn

How to use

Installing OpenPose

  1. Please install OpenPose 1.6.0 for Python by following the official guide. Note that the newest release on the OpenPose github is 1.7.0 - for this service to work, 1.6.0 must be used.

    A few things to note when installing OpenPose:

    • When cloning the OpenPose repository, use the following git command to get version 1.6.0:
      git clone --depth 1 --branch v1.6.0 https://github.com/CMU-Perceptual-Computing-Lab/openpose
      
    • Remember to run the following command on the newly cloned repository:
      git submodule update --init --recursive --remote
      
    • Use Visual Studio Enterprise 2017 to build the required files. Install this first if you do not already have it.
    • Install CUDA 10.0 and cuDNN 7.5.0 for CUDA 10.0 after installing Visual Studio Enterprise 2017.
    • When generating the files using CMake, make sure that the BUILD_PYTHON flag is enabled, and that the Python version is set to 3.7. Also make sure that the detected CUDA version is 10.0.
    • After building with Visual Studio Enterprise 2017, make sure that all necessary files have been generated.
      • There should be a openpose.dll in /x64/Release/
      • There should be a openpose.exp and openpose.lib in /src/openpose/Release/
      • There should be a pyopenpose.cp37-win_amd64.pyd in /python/openpose/Release/
  2. Install requirements from requirements.txt

  3. Change the path in main/openpose/paths.py to the path of your OpenPose installation:

    # Change this path so it points to your OpenPose path relative to this file
    OPEN_POSE_PATH = get_relative_path(__file__, '../../../../openpose')
    
  4. If you get any errors related to OpenPose when running the service, please go back and make sure that all instructions have been followed - be particularly careful to install the correct CUDA/cuDNN versions, make sure that the BUILD_PYTHON flag was enabled and that Python 3.7 was used when generating the files.

When OpenPose is successfully installed, you can either use the existing model trained on our dataset, or you can choose to make your own dataset and train a model on this instead.

alt text

Using the service

A singular endpoint '/recognize' has been created in order to perform recognition, which allows for POST requests to be made. The endpoint expects a sequence of base64 images, which will get converted into a suitable format recognizable by the classifier.

alt text

alt text

Creating a custom dataset

In order to create a custom dataset, you can access the file create_dataset.py and change the following constant:

DATASET_NAME = 'dsl_dataset'

Such that the path in the constant DATASET_DIR points to a folder where the dataset is located. This folder should contain another folder called 'src', which contains folders for all the desired labels in the dataset. Each of these folders should contain videos of the corresponding sign.

Before running the script, the following constants can be tweaked based on the desired settings:

WINDOW_LENGTH = 60
STRIDE = 5
BATCH_SIZE = 512
VAL_SPLIT = 0.2
TEST_SPLIT = 0.1

Finally, the following constant can be changed:

CREATE_RAW_DATA = True

This is because initial feature extraction by OpenPose can be a fairly lengthy process. This allows for the tweaking of the dataset after features have been extracted, by setting this to False. Note that the raw OpenPose data must be created before the actual dataset can be created, so it is necessary to do this at least once.

Training a custom model

In order to train a custom model you can make use of the train_models.py file. Here, the constant DATASET_NAME can be changed to reflect the name of the dataset you wish to use, such that the DATASET_DIR points to the correct folder. Furthermore, you can specify a tensorboard directory:

DATASET_NAME = 'dsl_dataset'
DATASET_DIR = f'.\\main\\algorithm\\datasets\\{DATASET_NAME}'
MODELS_DIR = f'.\\main\\algorithm\\models\\{DATASET_NAME}'
TENSORBOARD_DIR = f'{MODELS_DIR}\\logs'

Before running the script, you can tweak various training settings as well as the hyper parameters of the model by changing the following constants:

MODEL_NAME = "model"
EPOCHS = 25
LAYER_SIZES = [64]
DENSE_LAYERS = [0]
DENSE_ACTIVATION = "relu"
LSTM_LAYERS = [2]
LSTM_ACTIVATION = "tanh"
OUTPUT_ACTIVATION = "softmax"

Note that the trainer can train multiple models depending on these settings. Changing the LAYER_SIZES, DENSE_LAYERS and LSTM_LAYERS to contain several values will result in a model being trained for each possible combination.

After training your model, you should change the paths.py located in main/core/ to reflect the path to the new model by changing the constant MODEL_NAME to the name of your model:

MODEL_NAME = 'dsl_lstm.model'

Finally, it also possible to generate a confusion matrix for your model by using the generate_confusion_matrix.py script. Here, you simply change the constants DATASET_NAME and MODEL_NAME such that the DATASET_DIR points to your dataset directory, and MODEL_DIR points to your model directory, respectively:

DATASET_NAME = "dsl_dataset"
MODEL_NAME = "dsl_lstm"
DATASET_DIR = f"./main/algorithm/datasets/{DATASET_NAME}/{DATASET_NAME}.pickle"
MODEL_DIR = f"./main/algorithm/models/{DATASET_NAME}/{MODEL_NAME}"

Happy signing :O)

Authors

  • Adil Cemalovic
  • Martin Lønne
  • Magnus Helleshøj Lund
Owner
Martin Lønne
Full-stack software developer with an interest in Cloud development. Is working most with Javascript, C#, and Python for machine learning.
Martin Lønne
A webcam-based 3x3x3 rubik's cube solver written in Python 3 and OpenCV.

Qbr Qbr, pronounced as Cuber, is a webcam-based 3x3x3 rubik's cube solver written in Python 3 and OpenCV. 🌈 Accurate color detection 🔍 Accurate 3x3x

Kim 金可明 502 Dec 29, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

2.4k Jan 08, 2023
Introduction to image processing, most used and popular functions of OpenCV

👀 OpenCV 101 Introduction to image processing, most used and popular functions of OpenCV go here.

Vusal Ismayilov 3 Jul 02, 2022
Omdena-abuja-anpd - Automatic Number Plate Detection for the security of lives and properties using Computer Vision.

Omdena-abuja-anpd - Automatic Number Plate Detection for the security of lives and properties using Computer Vision.

Abdulazeez Jimoh 1 Jan 01, 2022
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 03, 2023
A simple Digits Recogniser made in Python

⭐ Python Digit Recogniser A simple digit Recogniser made in Python Demo Run Locally Clone the project git clone https://github.com/yashraj-n/python-

Yashraj narke 4 Nov 29, 2021
Document manipulation detection with python

image manipulation detection task: -- tianchi function image segmentation salie

JiaKui Hu 3 Aug 22, 2022
CNN+LSTM+CTC based OCR implemented using tensorflow.

CNN_LSTM_CTC_Tensorflow CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow. Note: there is No restriction on the numbe

Watson Yang 356 Dec 08, 2022
The CIS OCR PostCorrectionTool

The CIS OCR Post Correction Tool PoCoTo Source code for the Java-based PoCoTo client enabling fast interactive batch corrections of complete OCR error

CIS OCR Group 36 Dec 15, 2022
Msos searcher - A half-hearted attempt at finding a magic square of squares

MSOS searcher A half-hearted attempt at finding (or rather searching) a MSOS (Magic Square of Squares) in the spirit of the Parker Square. Running I r

Niels Mündler 1 Jan 02, 2022
A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

NOTES: To ensure best results, make sure you are running this on a computer that has decent specs. 1920x1080 fullscreen is required in League, game mu

francis 125 Dec 30, 2022
A Python script to capture images from multiple webcams at once and save them into your local machine

Capturing multiple images at once from Webcam Using OpenCV Capture multiple image by accessing the webcam of your system and save it to your machine.

Fazal ur Rehman 2 Apr 16, 2022
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
A curated list of resources dedicated to scene text localization and recognition

Scene Text Localization & Recognition Resources A curated list of resources dedicated to scene text localization and recognition. Any suggestions and

CarlosTao 1.6k Dec 22, 2022
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

188 Dec 28, 2022
かの有名なあの東方二次創作ソング、「bad apple!」のMVをPythonでやってみたって話

bad apple!! 内容 このプログラムは、bad apple!(feat. nomico)のPVをPythonを用いて再現しよう!という内容です。 実はYoutube並びにGithub上に似たようなプログラムがあったしなんならそっちの方が結構良かったりするんですが、一応公開しますw 使い方 こ

赤紫 8 Jan 05, 2023
Ocular is a state-of-the-art historical OCR system.

Ocular Ocular is a state-of-the-art historical OCR system. Its primary features are: Unsupervised learning of unknown fonts: requires only document im

228 Dec 30, 2022
基于图像识别的开源RPA工具,理论上可以支持所有windows软件和网页的自动化

SimpleRPA 基于图像识别的开源RPA工具,理论上可以支持所有windows软件和网页的自动化 简介 SimpleRPA是一款python语言编写的开源RPA工具(桌面自动控制工具),用户可以通过配置yaml格式的文件,来实现桌面软件的自动化控制,简化繁杂重复的工作,比如运营人员给用户发消息,

Song Hui 7 Jun 26, 2022
PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

PRImA Research Lab 46 Nov 14, 2022