基于openpose和图像分类的手语识别项目

Overview

手语识别


0、使用到的模型

(1). openpose,作者:CMU-Perceptual-Computing-Lab

https://github.com/CMU-Perceptual-Computing-Lab/openpose

(2). 图像分类classification,作者:Bubbliiiing

https://github.com/bubbliiiing/classification-pytorch

B站对应视频:https://www.bilibili.com/video/BV143411B7wg

(3). 手语教学视频,作者:二碳碳

https://www.bilibili.com/video/BV1XE41137LV

(感谢大佬们的开源项目和教程,都已star加三连)




1、大致思路

方法一: 将视频输入到openpose中,检测出关节点的变化轨迹,将轨迹绘制在一张图片上,把这张图片传到图像分类网络中检测属于哪个动作

视频  ----->  |  openpose  |-----> 关节点运动轨迹图-------> |  图像分类模型  | ----------> 单词分类  

方法二: 将视频输入到openpose中,检测出每一帧中关节点的位置,将多帧进行堆叠,形成一个三维张量,其中两个维度是图片的宽和高,一个维度是时间,然后对这个三维张量使用三维卷积进行训练和预测

视频  ----->  |  openpose  |-----> 多张关节点位置图 ---------> |  堆叠  | --------> 三维张量 -------> |  三维卷积网络  | ----------> 单词分类  



2、环境配置

python:3.7(其他版本会导致openpose无法运行,建议使用anaconda的python环境)
cuda:10
cudnn:7或8应该都行
(配置cuda和cudnn会比较麻烦,如果实在不想配,你可以去openpose的github网站下载使用cpu的版本,这里这个版本应该不支持cpu)


具体的配置环境方式:

(0).python和cuda和cudnn自己装


(1).下载文件

下载代码文件后,再从网盘下载模型和数据文件(没有这些跑不起来),网盘链接:

链接:https://pan.baidu.com/s/1Q2aVVhMhSfWL4qKS9QslkQ 
提取码:abcd 

将从github下载的文件夹和网盘下载的文件夹合并,然后就可以下一步了。 (当然你大可直接找我要u盘拿完整的文件)


(2).安装requirements.txt中的库

cmd进入环境后,cd到项目文件夹下,执行指令:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

(3).安装torch和torchvision

先下载好torch(1.2.0)和torchvision(0.4.0)的whl文件,下载地址:

链接:https://pan.baidu.com/s/1QIuJfEE5qQFpXY8ZlHeLNQ 
提取码:abcd 

(当然你依旧可直接找我拿u盘)

下载好torch和torchvision的whl文件后,cmd进入环境,cd到下载文件夹下,执行指令:

pip install [torch或torchvision的whl文件的文件名]

(先装torch再装torchvision,不然有可能会报错)



3、测试运行openpose

项目文件夹下有三个文件:

test.py
test_video.py
test_video_track_point.py

分别对应openpose的功能:检测图片、检测视频、检测视频并绘制关节点轨迹
具体的使用方法可以看文件中的注释部分


在test_video_track_point.py中,取消掉最后几行的注释,就可以将绘制的轨迹图送到classification中去做分类检测
(不过现阶段分类器尚未做好)




4、classification的训练和使用

可以看下classification文件夹中的README.md文件,大佬已经在里边讲得很详细了

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Dataset and Code for RealVSR Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme Xi Yang, Wangmeng Xiang,

Xi Yang 91 Nov 22, 2022
(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

BRNet Introduction This is a release of the code of our paper Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds,

86 Oct 05, 2022
Basic functions manipulating images using the OpenCV library

OpenCV Basic functions manipulating images using the OpenCV library. Reading Ima

Shatha Siala 3 Feb 17, 2022
Run tesseract with the tesserocr bindings with @OCR-D's interfaces

ocrd_tesserocr Crop, deskew, segment into regions / tables / lines / words, or recognize with tesserocr Introduction This package offers OCR-D complia

OCR-D 38 Oct 14, 2022
A tensorflow implementation of EAST text detector

EAST: An Efficient and Accurate Scene Text Detector Introduction This is a tensorflow re-implementation of EAST: An Efficient and Accurate Scene Text

2.9k Jan 02, 2023
Document Layout Analysis

Eynollah Document Layout Analysis Introduction This tool performs document layout analysis (segmentation) from image data and returns the results as P

QURATOR-SPK 198 Dec 29, 2022
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
Multi-choice answer sheet correction system using computer vision with opencv & python.

Multi choice answer correction 🔴 5 answer sheet samples with a specific solution for detecting answers and sheet correction. 🔴 By running the soluti

Reza Firouzi 7 Mar 07, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

68 Dec 14, 2022
CNN+LSTM+CTC based OCR implemented using tensorflow.

CNN_LSTM_CTC_Tensorflow CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow. Note: there is No restriction on the numbe

Watson Yang 356 Dec 08, 2022
⛓ marc is a small, but flexible Markov chain generator

About marc (markov chain) is a small, but flexible Markov chain generator. Usage marc is easy to use. To build a MarkovChain pass the object a sequenc

Max Humber 65 Oct 27, 2022
a deep learning model for page layout analysis / segmentation.

OCR Segmentation a deep learning model for page layout analysis / segmentation. dependencies tensorflow1.8 python3 dataset: uw3-framed-lines-degraded-

99 Dec 12, 2022
Captcha Recognition

The objective of this project is to recognize the target numbers in the captcha images correctly which would tell us how good or bad a captcha system has been built.

Mohit Kaushik 5 Feb 20, 2022
Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

Matt Zucker 1.2k Dec 29, 2022
OCR powered screen-capture tool to capture information instead of images

NormCap OCR powered screen-capture tool to capture information instead of images. Links: Repo | PyPi | Releases | Changelog | FAQs Content: Quickstart

575 Dec 31, 2022
scene-linear test images

Scene-Referred Image Collection A collection of OpenEXR Scene-Referred images, encoded as max 2048px width, DWAA 80 compression. All exrs are encoded

Gralk Klorggson 7 Aug 25, 2022
Ocular is a state-of-the-art historical OCR system.

Ocular Ocular is a state-of-the-art historical OCR system. Its primary features are: Unsupervised learning of unknown fonts: requires only document im

228 Dec 30, 2022
MXNet OCR implementation. Including text recognition and detection.

insightocr Text Recognition Accuracy on Chinese dataset by caffe-ocr Network LSTM 4x1 Pooling Gray Test Acc SimpleNet N Y Y 99.37% SE-ResNet34 N Y Y 9

Deep Insight 99 Nov 01, 2022
SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition PDF Abstract Explainable artificial intelligence has been gaining attention

87 Dec 26, 2022
Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching This repository is an official implementation of

HKUST-KnowComp 13 Sep 08, 2022