Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Overview

Manifold-SCA

Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

The repo is organized as:

📂manifold-sca
 ┣ 📂vulnerability
 ┃ ┣ 📂contribution
 ┃ ┣ 📜{dataset}-{program}-count.json
 ┃ ┗ 📜{program}.dis
 ┣ 📂code
 ┃ ┣ 📂SCA
 ┃ ┣ 📂tools
 ┃ ┗ 📂pp
 ┣ 📂audio
 ┗ 📂output

Code

We release our code in folder code. The implementation of our framework is in folder code/SCA and tools we use to process input/output data are listed in folder code/tools. To launch Prime+Prob, you can use the code in code/pp.

Attack

To prepare the training data for learning data manifold, you first need to instrument the binary with the released pintool code/tools/pinatrace.cpp. You will get a sequence of instruction address: accessed address when the binary processes a media data. Then you need to fold the sequence of accessed address into a matrix and convert the matrix with correct format (e.g., tensor, or numpy array).

We release the scripts for training the framework in folder code/SCA. Before training you need to first customize data paths in each script. The training procedure ends after 100 epochs and takes less than 24 hours on one Nvidia GeForce RTX 2080 GPU.

Localize

Recall that we localize vulnerabilities by pinpointing records in a trace that contribute most to reconstructing media data. So, to perform localization, you need first train the framework as we introduced before.

After training the framework, you just need to run code/localize.py and code/pinpoint.py to localize records in a side channel trace. Note that what you get in this step are several accessed addresses with their indexes in the trace. You need further get the corresponding instruction addresses based on the instrument output you generated when preparing training data.

We release the localized vulnerabilities in folder vulnerability. In folder vulnerability/contribution, we list the corresponding instruction addresses of records that make primary contribution to the reconstruction of media data. We further map the pinpoined instructions back to the corresponding functions. These functions are regarded as side-channel vulnerable functions. We list the results in {dataset}-{program}-count.json, where higher counting indicates a higher possibility of being vulnerable.

Despite each program is evaluated on different datasets, we can still observe that highly consistent vulnerabilities are localized in the same program.

Prime+Probe

We use Mastik to launch Prime+Probe on L1 cache of Intel Xeon CPU and AMD Ryzen CPU. We release our scripts in folder code/pp.

The experiment is launched in Linux OS. You need first to install taskset and cpuset.

We assume victim and spy are on the same CPU core and no other process is runing on this CPU core. To isolate a CPU core, you need to run sudo cset shield --cpu {cpu_id}.

Then run sudo cset shield --exec python run_pp.py -- {cpu_id} {segment_id}. Note that we seperate the media data into several segments to speed up the side channel collection. code/pp/run_pp.py runs code/pp/pp_audio.py with taskset. code/pp/pp_audio.py is the coordinator which runs spy and victim on the same CPU core simultaneously and saves the collected cache set access.

Audio

We upload all (total 2,552) audios reconstructed by our framework under Prime+Probe to folder audio/sc09-pp for result verification. Each audio is named as {Number}_{hash}_{index}.wav and the {Number} is the content of the corresponding reference input, e.g., for a reconstructed audio One_94de6a6a_nohash_1.wav, the number said in the reference input is one. As we reported in the paper, most (~80%) of the audios have consistent contents (i.e., the numbers) with the reference inputs.

Output

We upload media data reconstructed by our framework in folder output.

Owner
Yuanyuan Yuan
Yuanyuan Yuan
Fusion-in-Decoder Distilling Knowledge from Reader to Retriever for Question Answering

This repository contains code for: Fusion-in-Decoder models Distilling Knowledge from Reader to Retriever Dependencies Python 3 PyTorch (currently tes

Meta Research 323 Dec 19, 2022
Trading Strategies for Freqtrade

Freqtrade Strategies Strategies for Freqtrade, developed primarily in a partnership between @werkkrew and @JimmyNixx from the Freqtrade Discord. Use t

Bryan Chain 242 Jan 07, 2023
The full training script for Enformer (Tensorflow Sonnet) on TPU clusters

Enformer TPU training script (wip) The full training script for Enformer (Tensorflow Sonnet) on TPU clusters, in an effort to migrate the model to pyt

Phil Wang 10 Oct 19, 2022
Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

Keras_cv_attention_models Keras_cv_attention_models Usage Basic Usage Layers Model surgery AotNet ResNetD ResNeXt ResNetQ BotNet VOLO ResNeSt HaloNet

319 Dec 28, 2022
unet for image segmentation

Implementation of deep learning framework -- Unet, using Keras The architecture was inspired by U-Net: Convolutional Networks for Biomedical Image Seg

zhixuhao 4.1k Dec 31, 2022
Generalized and Efficient Blackbox Optimization System.

OpenBox Doc | OpenBox中文文档 OpenBox: Generalized and Efficient Blackbox Optimization System OpenBox is an efficient and generalized blackbox optimizatio

DAIR Lab 238 Dec 29, 2022
Kaggle competition: Springleaf Marketing Response

PruebaEnel Prueba Kaggle-Springleaf-master Prueba Kaggle-Springleaf Kaggle competition: Springleaf Marketing Response Competencia de Kaggle: Marketing

1 Feb 09, 2022
TensorRT examples (Jetson, Python/C++)(object detection)

TensorRT examples (Jetson, Python/C++)(object detection)

Nobuo Tsukamoto 53 Dec 22, 2022
Distributed Arcface Training in Pytorch

Distributed Arcface Training in Pytorch

3 Nov 23, 2021
This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Prompt-Based Multi-Modal Image Segmentation This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation". The sys

Timo Lüddecke 305 Dec 30, 2022
This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

slue-toolkit We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE

ASAPP Research 39 Sep 21, 2022
Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models (published in ICLR2018)

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models Pouya Samangouei*, Maya Kabkab*, Rama Chellappa [*: authors co

Maya Kabkab 212 Dec 07, 2022
Pseudo lidar - (CVPR 2019) Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving This paper has been accpeted by Conference o

Yan Wang 881 Dec 27, 2022
Medical Insurance Cost Prediction using Machine earning

Medical-Insurance-Cost-Prediction-using-Machine-learning - Here in this project, I will use regression analysis to predict medical insurance cost for people in different regions, and based on several

1 Dec 27, 2021
A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks A Research-oriented Federated Learning Library and Benchmark Platform

FedML-AI 175 Dec 01, 2022
Project page for the paper Semi-Supervised Raw-to-Raw Mapping 2021.

Project page for the paper Semi-Supervised Raw-to-Raw Mapping 2021.

Mahmoud Afifi 22 Nov 08, 2022
Computational Methods Course at UdeA. Forked and size reduced from:

Computational Methods for Physics & Astronomy Book version at: https://restrepo.github.io/ComputationalMethods by: Sebastian Bustamante 2014/2015 Dieg

Diego Restrepo 11 Sep 10, 2022
Deep Convolutional Generative Adversarial Networks

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Alec Radford, Luke Metz, Soumith Chintala All images in t

Alec Radford 3.4k Dec 29, 2022
Recurrent Conditional Query Learning

Recurrent Conditional Query Learning (RCQL) This repository contains the Pytorch implementation of One Model Packs Thousands of Items with Recurrent C

Dongda 4 Nov 28, 2022
Benchmark for the generalization of 3D machine learning models across different remeshing/samplings of a surface.

Discretization Robust Correspondence Benchmark One challenge of machine learning on 3D surfaces is that there are many different representations/sampl

Nicholas Sharp 10 Sep 30, 2022