Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Last update: Nov 01, 2022

Related tags

Overview

Introduction

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models".

In this work, we demonstrate that existing self-supervised speech model such as HuBERT, wav2vec 2.0, CPC and TERA are vulnerable to membership inference attack (MIA) and thus could reveal sensitive informations related to the training data.

Requirements

Python >= 3.6
Install sox on your OS
Install s3prl on your OS

git clone https://github.com/s3prl/s3prl
cd s3prl
pip install -e ./

Install the specific fairseq

pip install [email protected]+https://github.com//pytorch/[email protected]#egg=fairseq

Preprocessing

First, extract the self-supervised feature of utterances in each corpus according to your needs.

Currently, only LibriSpeech is available.

BASE_PATH=/path/of/the/corpus
OUTPUT_PATH=/path/to/save/feature
MODEL=wav2vec2
SPLIT=train-clean-100 # you should extract train-clean-100, dev-clean, dev-other, test-clean, test-other

python preprocess_feature_LibriSpeech.py \
    --base_path $BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --split $SPLIT

Speaker-level MIA

After extracting the features, you can apply the attack against the models using either basic attack and improved attack.

Noted that you should run the basic attack to generate the .csv file with similarity scores before performing improved attack.

Basic Attack

SEEN_BASE_PATH=/path/you/save/feature/of/seen/corpus
UNSEEN_BASE_PATH=/path/you/save/feature/of/unseen/corpus
OUTPUT_PATH=/path/to/output/results
MODEL=wav2vec2

python predefined-speaker-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \

Improved Attack

python train-speaker-level-similarity-model.py \
    --seen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --speaker_list "${OUTPUT_PATH}/${MODEL}-customized-speaker-level-attack-similarity.csv"

python customized-speaker-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --similarity_model_path "${OUTPUT_PATH}/customized-speaker-similarity-model-${MODEL}.pt"

Utterance-level MIA

The process for utterance-level MIA is similar to that of speaker-level:

Basic Attack

SEEN_BASE_PATH=/path/you/save/feature/of/seen/corpus
UNSEEN_BASE_PATH=/path/you/save/feature/of/unseen/corpus
OUTPUT_PATH=/path/to/output/results
MODEL=wav2vec2

python predefined-utterance-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \

Improved Attack

python train-utterance-level-similarity-model.py \
    --seen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --speaker_list "${OUTPUT_PATH}/${MODEL}-customized-utterance-level-attack-similarity.csv"

python customized-utterance-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --similarity_model_path "${OUTPUT_PATH}/customized-utterance-similarity-model-${MODEL}.pt"

Citation

If you find our work useful, please cite:

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Related tags

Overview

Introduction

Requirements

Preprocessing

Speaker-level MIA

Basic Attack

Improved Attack

Utterance-level MIA

Basic Attack

Improved Attack

Citation

Owner

Wei-Cheng Tseng

Time Series Cross-Validation -- an extension for scikit-learn

Solving SMPL/MANO parameters from keypoint coordinates.

Synthetic Humans for Action Recognition, IJCV 2021

Cosine Annealing With Warmup

Car Price Predictor App used to predict the price of the car based on certain input parameters created using python's scikit-learn, fastapi, numpy and joblib packages.

"Segmenter: Transformer for Semantic Segmentation" reproduced via mmsegmentation

Scikit-learn compatible estimation of general graphical models

Group Activity Recognition with Clustered Spatial Temporal Transformer

A collection of inference modules for fastai2

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

A TensorFlow implementation of FCN-8s

This repository implements variational graph auto encoder by Thomas Kipf.

Code for the paper titled "Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks" (NeurIPS 2021 Spotlight).

An open source library for face detection in images. The face detection speed can reach 1000FPS.

Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.

Telegram chatbot created with deep learning model (LSTM) and telebot library.

Domain Generalization for Mammography Detection via Multi-style and Multi-view Contrastive Learning

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Code for DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

iNAS: Integral NAS for Device-Aware Salient Object Detection