Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Last update: Dec 12, 2022

Overview

Visual Transformer for Facial Emotion Recognition (FER)

This project has the aim to build an efficient Visual Transformer for the Facial Emotion Recognition (FER) task. Project is interally on Python Notebook, hosted on Google Colab with a runtime environment given by NVIDIA P100 setup.

Dataset

Dataset is formed by 8 different classes integrated by 3 different subsets:

FER-2013: It contains approximately 35,000 facial RGB images of different expressions with size restricted to 48×48, and the main labels of it can be divided into 7 types: 0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral. The Disgust expression has the minimal number of images – 600, while other labels have nearly 5,000 samples each.
CK+: The Extended Cohn-Kanade (CK+) dataset contains some images extrapolated from 593 video sequences from a total of 123 different subjects, ranging from 18 to 50 years of age with a variety of genders and heritage. Each video shows a facial shift from the neutral expression to a targeted peak expression, recorded at 30 frames per second (FPS) with a resolution of either 640x490 or 640x480 pixels. Unfortunately, we don't have the entire generated datasets but we stored only 1000 images with high variance from a kaggle repository.
AffectNet: It is a large facial expression dataset with 41.000 images classified in eight categories (neutral, happy, angry, sad, fear, surprise, disgust, contempt) of facial expressions along with the intensity of valence and arousal.

Data loading, integration and analysis are in the first part of the ViT-Emotion-Recognition.ipynb notebook. The result dataset is an integration divided by two subset (train an val folder) with 8 subfolder with the scope of the class label.

Data Management

Given an eterogeneous dataset on a fine-tuned transformer, we had to manage some image features:

Data Scaling: Pre-trained models are transformers with different configurations that train them on ImageNet dataset for the object detection with images on 224x224. We use the same scale and convert input data to this size.
Data Channels: We use RGB channels for each images for the same reason of the previous point.
Data Augmentation: We use brightness, rotation, scaling, translation and zooming augmentation to improve the amount of the samples and balance the dataset classes variation.

Model

Overview of the model: The input image is split into fixed-sized patches; the embedding phase is preceded by a convolutional layer with a kernel 16x16 with a stride of 16x16. The output of the convolution is then used for the embedding phase where the resulting vector is given by the sum of the position embedding and a linear embedding in a projection space of 768 dimensions. The embedded patches are then processed by a set of 11 sequential Transformer Encoders. For the classification task, the final layer is a linear layer with a 8 dimensional output for our eight emotions. The model we rely on is pretrained on ImageNet and finetuned with the datased described above.

Source: https://github.com/google-research/vision_transformer

Authors

Andrea Gurioli (@andreagurioli1995)
Mario Sessa (@kode-git)

License

You might also like...

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel gating to capture and interpolate complex motion trajectories between frames to generate realistic high frame rate videos. This repository contains original source code for the paper accepted to CVPR 2021.

280 Dec 23, 2022

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

12 Nov 9, 2022

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

LMS Attendance Marker Automatic script for lazy people to mark attendance on LMS for Practice School 1. Setup Add your LMS credentials and time slot t

3 Jun 12, 2021

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

fwhr-calc-website This project is to automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azur

1 Feb 7, 2022

Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 different colors, eraser and a recording option that records your session and saves it in a "recordings" folder. Use index finger to draw and two or more fingers to move around and select items. Future version will contain more functionalities like changeable thickness, color palette, integration with zoom and google meet etc.

hand-write Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 differ

27 Dec 16, 2022

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型，分别是SimpleCNN和MiniXception。利用 imdb_crop

8 Mar 11, 2022

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

49 Jan 7, 2023

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

3k Jan 4, 2023

A Moonraker plug-in for real-time compensation of frame thermal expansion

Frame Expansion Compensation A Moonraker plug-in for real-time compensation of frame thermal expansion. Installation Credit to protoloft, from whom I

58 Jan 2, 2023

Comments

Pre-processing phase removes some images
After the Data Analysis on the AVFER, data from the splitting phase is different after the pre-processing, we need to check

Check the removing of png can influence the number

Control if there are some changes after the reshaping

Be care about the possible miss-indentation of the os.remove(fl)

I need to run again the data integration and data analysis of the AVFER before test features variation on the pre-processing phase.
bug
opened by kode-git 2

Releases(0.3.12)

0.3.12(May 16, 2022)
Adding presentation and official documentation

Splitting notebook per sections

Adding additional comments to the code

Source code(tar.gz)
Source code(zip)
0.3.11(May 14, 2022)
Adding ViT-B/16/S model on 25 epochs with constant learning rate

Checking on training and validation accuracy/loss parameters according to the training log

Display results on standalone plots

Source code(tar.gz)
Source code(zip)
vfer_small_25.pth(327.37 MB)
vfer_small_25_history_loss.pkl(490 bytes)
vfer_small_25_history_train.pkl(233 bytes)
vfer_small_25_history_val.pkl(233 bytes)
0.3.10(May 13, 2022)
Adding evaluation for ResNet18

Debugging on SAM model evaluation

Improvment Training Plot support curves on N < 5 lines

Model adaptation during loading on evaluation (standalone) with adapting on backbones

Source code(tar.gz)
Source code(zip)
0.3.9(May 12, 2022)
Adding ResNet 18 (11M parameters)

Upload history for loss and accuracy

Upload epoch 20 dump

Upload final model checkpoint

Source code(tar.gz)
Source code(zip)
resnet18_25.pth(42.72 MB)
resnet18_25_history_loss.pkl(490 bytes)
resnet18_25_history_train.pkl(7.05 KB)
resnet18_25_history_val.pkl(7.05 KB)
0.3.8(May 11, 2022)
Adding ViT-B/16/SG

Gradual learning rate every 10 epochs

SGD optimization

Adding loss and accuracy histories

Source code(tar.gz)
Source code(zip)
vfer_grad_25.pth(327.37 MB)
vfer_grad_25_history_loss.pkl(490 bytes)
vfer_grad_25_history_train.pkl(233 bytes)
vfer_grad_25_history_val.pkl(233 bytes)
0.3.7(May 11, 2022)
Adding VIT-B/16 model checkpoint using customized learning rate scheduler

Adding SAM to the model as a optimization algorithm to smooth the loss landscape

Adding history for training and validation loss

Adding history for training and validation accuracy

Source code(tar.gz)
Source code(zip)
vfer_sam_25.pth(327.37 MB)
vfer_sam_25_history_loss.pkl(490 bytes)
vfer_sam_25_history_train.pkl(233 bytes)
vfer_sam_25_history_val.pkl(233 bytes)
0.3.6(May 9, 2022)
Configuration of resnet18 with gradual learning rate

Starting learning rate at 0.01

Epochs 50 with plateau at 25

Loading training and validation accuracy histories

Source code(tar.gz)
Source code(zip)
resnet18.pth(44.69 MB)
resnet18_25_history_loss.pkl(490 bytes)
resnet18_history_train.pkl(14.17 KB)
resnet18_history_val.pkl(14.17 KB)
0.3.5(May 9, 2022)
Adding SAM optimization for VIT-B/16

Defining closure for sharpness-aware minimization efficiency

Debugging model loader for the checkpoints recovery

Source code(tar.gz)
Source code(zip)
0.2.5(May 7, 2022)
Upload optimal model on AffectNet

Defines evaluation plots on accuracy and loss values

Source code(tar.gz)
Source code(zip)
vfer_grad_25.pth(327.37 MB)
vfer_grad_25_history_loss.pkl(130 bytes)
vfer_grad_25_history_train.pkl(1.48 KB)
vfer_grad_25_history_val.pkl(1.48 KB)
0.2.4(May 6, 2022)
Adding gradual learning rate

Modify dataset with AffectNet in validation and testing set

Adding scheduler for learning rate adjustment

Source code(tar.gz)
Source code(zip)
vfer_grad_50.pth(327.37 MB)
vfer_grad_50_history_train.pkl(2.86 KB)
vfer_grad_50_history_val.pkl(2.86 KB)
0.2.3(Apr 29, 2022)
Extends data analysis for the AffectNet, CK+48 and FER-2013

Creation of AVFER with the following features

Splitting initial dataset in training and testing set with ratio 80/20

Splitting validation and training set with ratio 90/10

Testing and validation set contains only samples from AffectNet (RGB and high quality images)

Drive of AVFER: https://drive.google.com/drive/folders/1-8WG_CNrU3chL_OHpkM8EYx3Bm129cnE?usp=sharing
Source code(tar.gz)
Source code(zip)
0.2.2(Apr 27, 2022)
Adjust train and test splitting

Balancing augmentation over 150.000 samples

Removing augmentation on validation to increment variability

Loading of vfer for 5, 15 and 25 epochs of training on the result dataset

Loading history for training and validation accuracy/loss

Source code(tar.gz)
Source code(zip)
epoch_15_vfer_small_50(327.37 MB)
epoch_15_vfer_small_50.pth(327.37 MB)
epoch_25_vfer_small.pth(327.37 MB)
epoch_25_vfer_small_50(327.37 MB)
epoch_5_vfer_small_50(327.37 MB)
vfer_small_15_on_50_history_loss.pkl(220 bytes)
vfer_small_15_on_50_history_train.pkl(3.00 KB)
vfer_small_15_on_50_history_val.pkl(3.00 KB)
vfer_small_25_on_50_history_loss.pkl(220 bytes)
vfer_small_25_on_50_history_train.pkl(3.00 KB)
vfer_small_25_on_50_history_val.pkl(3.00 KB)
0.2.1(Apr 24, 2022)
Adding integration with partial training during the transformer weights improvements (best-fit)

Updating of the VFER model on 5/50 training epochs with 62% accuracy (state-of-art of AffectNet visual transformer)

Integrating with fluid system for face detection in the cropping phase

Source code(tar.gz)
Source code(zip)
epoch_5_vfer_small_50(327.37 MB)
0.2.0(Apr 22, 2022)
Adjust normalization parameters from [0.48, 0.28] to 0.5

Balancing dataset with not augment element in validation

Resize the training set on double capacity for less epochs on training phase

Adding featuring and inference on video capture tools in OpenCV for models applications

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 18, 2022)
Model dump for batch 50 on 12 epochs for the VFER transformer, accuracy of 69%

Model dump for batch 60 on 24 epochs for the VFER transformer, accuracy of 70%

Model dump for batch 60 on 50 epochs for the VFER transformer, accuracy of 71%

Debugging notebook for the loss evaluation

Adding every section until the evaluation

Integration of the dataset available here

Source code(tar.gz)
Source code(zip)
vfer_base_12.zip(304.26 MB)
vfer_base_24.zip(304.25 MB)
vfer_base_50.zip(608.51 MB)

Owner

Mario Sessa

Computer Scientist for /dev/null. Master Student in Computer Science.

GitHub Repository

This program creates a formatted excel file which highlights the undervalued stock according to Graham's number.

Over-and-Undervalued-Stocks Of Nepse Using Graham's Number Scrap the latest data using different websites and creates a formatted excel file that high

6 May 03, 2022

Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

Oriented RepPoints for Aerial Object Detection The code for the implementation of “Oriented RepPoints + Swin Transformer/ReResNet”. Introduction Based

96 Dec 13, 2022

Source code for Fixed-Point GAN for Cloud Detection

FCD: Fixed-Point GAN for Cloud Detection PyTorch source code of Nyborg & Assent (2020). Abstract The detection of clouds in satellite images is an ess

8 Dec 22, 2022

Code for the SIGGRAPH 2022 paper "DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds."

DeltaConv [Paper] [Project page] Code for the SIGGRAPH 2022 paper "DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds" by Ru

98 Nov 26, 2022

a spacial-temporal pattern detection system for home automation

Argos a spacial-temporal pattern detection system for home automation. Based on OpenCV and Tensorflow, can run on raspberry pi and notify HomeAssistan

133 Jan 05, 2023

Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Project Azua 0. Overview Many modern AI algorithms are known to be data-hungry, whereas human decision-making is much more efficient. The human can re

197 Jan 06, 2023

Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

spatial-intention-maps This code release accompanies the following paper: Spatial Intention Maps for Multi-Agent Mobile Manipulation Jimmy Wu, Xingyua

70 Jan 02, 2023

A library for building and serving multi-node distributed faiss indices.

About Distributed faiss index service. A lightweight library that lets you work with FAISS indexes which don't fit into a single server memory. It fol

170 Dec 30, 2022

Human annotated noisy labels for CIFAR-10 and CIFAR-100.

Dataloader for CIFAR-N CIFAR-10N noise_label = torch.load('./data/CIFAR-10_human.pt') clean_label = noise_label['clean_label'] worst_label = noise_lab

[email protected]"> 117 Nov 30, 2022

Code for Transformer Hawkes Process, ICML 2020.

Transformer Hawkes Process Source code for Transformer Hawkes Process (ICML 2020). Run the code Dependencies Python 3.7. Anaconda contains all the req

111 Dec 26, 2022

A Pythonic library for Nvidia Codec.

A Pythonic library for Nvidia Codec. The project is still in active development; expect breaking changes. Why another Python library for Nvidia Codec?

12 Dec 27, 2022

Language model Prompt And Query Archive

LPAQA: Language model Prompt And Query Archive This repository contains data and code for the paper How Can We Know What Language Models Know? Install

127 Dec 20, 2022

基于tensorflow 2.x的图片识别工具集

Classification.tf2 基于tensorflow 2.x的图片识别工具集功能粗粒度场景图片分类细粒度场景图片分类其他场景图片分类模型部署 tensorflow serving本地推理和docker部署 tensorRT onnx ... 数据集 https://hyper.a

1 Nov 03, 2021

performing moving objects segmentation using image processing techniques with opencv and numpy

Moving Objects Segmentation On this project I tried to perform moving objects segmentation using background subtraction technique. the introduced meth

15 Dec 12, 2022

Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)

Contrastive Unpaired Translation (CUT) video (1m) | video (10m) | website | paper We provide our PyTorch implementation of unpaired image-to-image tra

1.7k Dec 27, 2022

Massively parallel Monte Carlo diffusion MR simulator written in Python.

Disimpy Disimpy is a Python package for generating simulated diffusion-weighted MR signals that can be useful in the development and validation of dat

16 Nov 11, 2022

PyTorch implementation of Constrained Policy Optimization

PyTorch implementation of Constrained Policy Optimization (CPO) This repository has a simple to understand and use implementation of CPO in PyTorch. A

25 Dec 08, 2022

Negative Interactions for Improved Collaborative Filtering:

Negative Interactions for Improved Collaborative Filtering: Don’t go Deeper, go Higher This notebook provides an implementation in Python 3 of the alg

21 Mar 05, 2022

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. Now with tensorflow 1.0 support. Evaluation usa

349 Aug 06, 2022

Code for KHGT model, AAAI2021

KHGT Code for KHGT accepted by AAAI2021 Please unzip the data files in Datasets/ first. To run KHGT on Yelp data, use python labcode_yelp.py For Movi

32 Nov 29, 2022

Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Related tags

Overview

Visual Transformer for Facial Emotion Recognition (FER)

Dataset

Data Management

Model

Authors

License

You might also like...

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

A Moonraker plug-in for real-time compensation of frame thermal expansion

Comments

Pre-processing phase removes some images

Releases(0.3.12)

0.3.12(May 16, 2022)

0.3.11(May 14, 2022)

0.3.10(May 13, 2022)

0.3.9(May 12, 2022)

0.3.8(May 11, 2022)

0.3.7(May 11, 2022)

0.3.6(May 9, 2022)

0.3.5(May 9, 2022)

0.2.5(May 7, 2022)

0.2.4(May 6, 2022)

0.2.3(Apr 29, 2022)

0.2.2(Apr 27, 2022)

0.2.1(Apr 24, 2022)

0.2.0(Apr 22, 2022)

0.1.0(Apr 18, 2022)

Owner

Mario Sessa

This program creates a formatted excel file which highlights the undervalued stock according to Graham's number.

Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

Source code for Fixed-Point GAN for Cloud Detection

Code for the SIGGRAPH 2022 paper "DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds."

a spacial-temporal pattern detection system for home automation

Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

A library for building and serving multi-node distributed faiss indices.

Human annotated noisy labels for CIFAR-10 and CIFAR-100.

Code for Transformer Hawkes Process, ICML 2020.

A Pythonic library for Nvidia Codec.

Language model Prompt And Query Archive

基于tensorflow 2.x的图片识别工具集

performing moving objects segmentation using image processing techniques with opencv and numpy

Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)

Massively parallel Monte Carlo diffusion MR simulator written in Python.

PyTorch implementation of Constrained Policy Optimization

Negative Interactions for Improved Collaborative Filtering:

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Code for KHGT model, AAAI2021