Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Last update: Apr 30, 2022

Overview

Speaker-Embeddings-Correlation-Pooling

This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations" by T. Stafylakis, J. Rohdin, and L. Burget (Interspeech 2021), a result of the collaboration between Omilia - Conversational Intelligence and Brno University of Technology (BUT), which you may find here.

The code is in TensorFlow1 (TF1) but it should work with TF2 too. I only provide the code for creating the network and the required hyperparameters. The training hyperparameters we used can be found in the paper.

The code is well-commented, at least the part and (hyper-)parameters required for the correlation pooling.

Apart from the experiments provided in the paper, the code allows the user to: (a) Combine standard statistics pooling with correlation pooling, by concatenating the two pooling layers into a single one, and (b) Extract correlation pooling from outputs of all 4 internal ResNet blocks (aka stages) and concatenate them in the pooling layer.

The code can be more efficiently written using tensor-only operators. However, to facilitate research we have implemented it using lists of tensors, e.g. after merging frequency bins to frequency ranges. Despite this inefficiency, we observe no differences between correlation pooling and standard stats pooling in training speed.

Start with the file train_resnet.py, which creates the ResNet (with the pooling mechanism) and sets its parameters. All parameters are set so that you reproduce our best performing experiment (P7 in the paper).

So, try it and let us know what you'll get! Themos

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Related tags

Overview

Speaker-Embeddings-Correlation-Pooling

Owner

Themos Stafylakis

test

COVID-19 Chatbot with Rasa 2.0: open source conversational AI

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Synthetic data for the people.

Pattern Matching in Python

Python library to make development of portfolio analysis faster and easier

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

Transformer training code for sequential tasks

Large-scale Knowledge Graph Construction with Prompting

Hostapd-mac-tod-acl - Setup a hostapd AP with MAC ToD ACL

Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

A python wrapper around the ZPar parser for English.

texlive expressions for documents

CCF BDCI 2020 房产行业聊天问答匹配赛道 A榜47/2985

pytorch implementation of Attention is all you need

Final Project for the Intel AI Readiness Boot Camp NLP (Jan)