Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Last update: Nov 12, 2022

Overview

Light-SERNet

This is the Tensorflow 2.x implementation of our paper "Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition", submitted in ICASSP 2022.

In this paper, we propose an efficient and lightweight fully convolutional neural network(FCNN) for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability. The extracted features are used to classify the emotion of the input speech segment. While our model has a smaller size than that of the state-of-the-art models, it achieves a higher performance on the IEMOCAP and EMO-DB datasets.

Run

1. Clone Repository

$ git clone https://github.com/AryaAftab/LIGHT-SERNET.git
$ cd LIGHT-SERNET/

2. Requirements

Tensorflow >= 2.3.0
Numpy >= 1.19.2
Tqdm >= 4.50.2
Matplotlib> = 3.3.1
Scikit-learn >= 0.23.2

$ pip install -r requirements.txt

3. Data:

Download EMO-DB and IEMOCAP(requires permission to access) datasets
extract them in data folder

4. Prepare datasets :

Use the following code to convert each dataset to the desired size(second):

$ python utils/segment/segment_dataset.py -dp data/{dataset_folder} -ip utils/DATASET_INFO.json -d {datasetname_in_jsonfile} -l {desired_size(seconds)}

For example, for EMO-DB Dataset :

$ python utils/segment/segment_dataset.py -dp data/EMO-DB -ip utils/DATASET_INFO.json -d EMO-DB -l 3

5. Set hyperparameters and training config :

You only need to change the constants in the hyperparameters.py to set the hyperparameters and the training config.

6. Strat training:

Use the following code to train the model on the desired dataset with the desired cost function.

Note 1: The database name is the name of the database folder after segmentation.
Note 2: The results for the confusion matrix are saved in the result folder.

$ python train.py -dn {dataset_name_after_segmentation} -ln {cost_function_name}

For example, for EMO-DB Dataset :

$ python train.py -dn EMO-DB_3s_Segmented -ln focal

Citation

If you find our code useful for your research, please consider citing:

@article{aftab2021light,
  title={Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition},
  author={Aftab, Arya and Morsali, Alireza and Ghaemmaghami, Shahrokh and Champagne, Benoit},
  journal={arXiv preprint arXiv:2110.03435},
  year={2021}
}

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Related tags

Overview

Light-SERNet

Run

1. Clone Repository

2. Requirements

3. Data:

4. Prepare datasets :

5. Set hyperparameters and training config :

6. Strat training:

Citation

Owner

Arya Aftab

PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

Medical Image Segmentation using Squeeze-and-Expansion Transformers

Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing

BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization

Group-Free 3D Object Detection via Transformers

A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

pytorch implementation for PointNet

Unofficial PyTorch Implementation of AHDRNet (CVPR 2019)

The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.

Romanian Automatic Speech Recognition from the ROBIN project

Demo notebooks for Qiskit application modules demo sessions (Oct 8 & 15):

Convert Apple NeuralHash model for CSAM Detection to ONNX.

Pytorch implementation of the paper "COAD: Contrastive Pre-training with Adversarial Fine-tuning for Zero-shot Expert Linking."

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

Mitsuba 2: A Retargetable Forward and Inverse Renderer

This repository comes with the paper "On the Robustness of Counterfactual Explanations to Adverse Perturbations"

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.