Unicorn can be used for performance analyses of highly configurable systems with causal reasoning

Related tags

Deep Learningunicorn
Overview

Unicorn (EuroSys 2022)

Unicorn can be used for performance analyses of highly configurable systems with causal reasoning. Users or developers can query Unicorn for a performance task.

Overview

overview

Abstract

Modern computer systems are highly configurable, with the total variability space sometimes larger than the number of atoms in the universe. Understanding and reasoning about the performance behavior of highly configurable systems, due to a vast variability space, is challenging. State-of-the-art methods for performance modeling and analyses rely on predictive machine learning models, therefore, they become (i) unreliable in unseen environments (e.g., different hardware, workloads), and (ii) produce incorrect explanations. To this end, we propose a new method, called Unicorn, which (i) captures intricate interactions between configuration options across the software-hardware stack and (ii) describes how such interactions impact performance variations via causal inference. We evaluated Unicorn on six highly configurable systems, including three on-device machine learning systems, a video encoder, a database management system, and a data analytics pipeline. The experimental results indicate that Unicorn outperforms state-of-the-art performance optimization and debugging methods. Furthermore, unlike the existing methods, the learned causal performance models reliably predict performance for new environments.

Pre-requisites

  • python 3.6
  • json
  • pandas
  • numpy
  • flask
  • causalgraphicalmodels
  • causalnex
  • graphviz
  • py-causal
  • causality

Please run the following commands to have your system ready to run Unicorn:

git clone https://github.com/softsys4ai/unicorn.git
cd unicorn
pip install pandas
pip install numpy
pip install flask
pip install causalgraphicalmodels
pip install causalnex
pip install graphviz
pip install py-causal
pip install causality
pip install tensorflow-gpu==1.15
pip install keras
pip install torch==1.4.0 torchvision==0.5.0

How to use Unicorn

Unicorn can be used for performing different tasks such as performance optimization and performance debugging. Unicorn supports both offline and online modes. In the offline mode, Unicorn can be run on any device that uses previously measured configurations. In the online mode, the measurements are performed from NVIDIA Jetson Xavier, NVIDIA Jetson TX2, and NVIDIA Jetson TX1 devices directly. To collect measurements from these devices sudo privilege is required as it requires setting a device to a new configuration before measurement.

Debugging (offline)

Unicorn supports debugging and fixing single-objective and multi-objective performance faults. It also supports root cause analysis of these fixes such as determining accuracy, computing gain etc.

Single-objective debugging

To debug single-objective faults in the offline mode using Unicorn please use the following command:

python unicorn_debugging.py  -o objective -s softwaresystem -k hardwaresystem -m mode

Example

To debug single-objective latency faults for Xception in JETSON TX2 in the offline mode using Unicorn please use the following command:

python unicorn_debugging.py  -o inference_time -s Xception -k TX2 -m offline

To debug single-objective energy faults for Bert in JETSON Xavier in the offline mode using Unicorn please use the following command:

python unicorn_debugging.py  -o total_energy_consumption -s Bert -k Xavier -m offline

Multi-objective debugging

To debug multi-objective faults in the offline mode using Unicorn please use the following command:

python unicorn_debugging.py  -o objective1 -o objective2 -s softwaresystem -k hardwaresystem -m mode

Example

To debug multi-objective latency and energy faults for Deepspeech in JETSON TX2 in the offline mode using Unicorn please use the following command:

python unicorn_debugging.py  -o inference_time -o total_energy_consumption -s Deepspeech  -k TX2 -m offline

Optimization (offline)

Unicorn supports single-objective and multi-objective optimization..

Single-objective optimization

To run single-objective optimization in the offline mode using Unicorn please use the following command:

python unicorn_optimization.py  -o objective -s softwaresystem -k hardwaresystem -m mode

Example

To To run single-objective latency optimization for Xception in JETSON TX2 in the offline mode using Unicorn please use the following command:

python unicorn_optimization.py  -o inference_time -s Xception -k TX2 -m offline

To run single-objective energy optimization for Bert in JETSON Xavier in the offline mode using Unicorn please use the following command:

python unicorn_optimization.py  -o total_energy_consumption -s Bert -k Xavier -m offline

Multi-objective debugging

To run multi-objective optimization in the offline mode using Unicorn please use the following command:

python unicorn_optimization.py  -o objective1 -o objective2 -s softwaresystem -k hardwaresystem -m mode

Example

To run multi-objective latency and energy optimization for Deepspeech in JETSON TX2 in the offline mode using Unicorn please use the following command:

python unicorn_optimization.py  -o inference_time -o total_energy_consumption -s Deepspeech  -k TX2 -m offline

Transferability

Unicorn supports both single and multi-objective transferability. However, multi-objective transferability is not comprehensively investigated in this version. To determine single-objective transferability of Unicorn use the following command:

python unicorn_transferability.py  -o objective -s softwaresystem -k hardwaresystem

Example

To run single-objective latency transferability for Xception in JETSON TX2 in the offline mode using Unicorn please use the following command:

python unicorn_optimization.py  -o inference_time -s Xception -k TX2 -m offline

To run single-objective energy transferability for Bert in JETSON Xavier in the offline mode using Unicorn please use the following command:

python unicorn_optimization.py  -o total_energy_consumption -s Bert -k Xavier -m offline

Data generation

To run experiments on NVIDIA Jetson Xavier, NVIDIA Jetson TX2, and NVIDIA Jetson TX1 devices for a particular software a flask app is required to be launched. Please use the following command to start the app in the localhost.

python run_service.py softwaresystem

For example to initialize a flask app with Xception software system please use:

python run_service.py Image

Once the flask app is running and modelserver is ready then please use the following command to collect performance measurements for different configurations:

python run_params.py softwaresystem

Unicorn usage on a different dataset

To run Unicorn on your a different dataset you will only need unicorn_debugging.py and unicorn_optimization.py. In the online mode, to perform interventions using the recommended configuration you need to develop your own utilities (similar to run_params.py). Additionally, you need to make some changes in the etc/config.yml to use the configuration options and their values accordingly. The necessary steps are the following:

Step 1: Update init_dirin config.yml with the directory where initial data is stored.

Step 2: Update bug_dir in config.yml with the directory where bug data is stored.

Step 3: Update output_dir variable in the config.yml file where you want to save the output data.

Step 4: Update hardware_columns in the config.yml with the hardware configuration options you want to use.

Step 5: Update kernel_columns in the config.yml with the kernel configuration options you want to use.

Step 6: Update perf_columns in the config.yml with the events you want to track using perf. If you use any other monitoring tool you need to update it accordingly.

Step 7: Update measurment_colums in the config.yml based on the performance objectives you want to use for bug resolve.

Step 8: Update is_intervenable variables in the config.yml with the configuration options you want to use and based on your application change their values to True or False. True indicates the configuration options can be intervened upon and vice-versa for False.

Step 9: Update the option_values variables in the config.yml based on the allowable values your option can take.

At this stage you can run unicorn_debugging.py and unicorn_optimization.py with your own specification. Please notice that you also need to update the directories according to your software and hardware name in data directory. If you change the name of the variables in the config file or use a new config fille you need to make changes accordingly from in unicorn_debugging.py and unicorn_optimization.py.

How to cite

If you use Unicorn in your research or the dataset in this repository please cite the following:

@article{iqbalcadet,
  title={CADET: A Systematic Method For Debugging Misconfigurations using Counterfactual Reasoning},
  author={Iqbal, Md Shahriar and Krishna, Rahul and Javidian, Mohammad Ali and Ray, Baishakhi and Jamshidi, Pooyan}
}

Contacts

Please please feel free to contact via email if you find any issues or have any feedbacks. Thank you for using Unicorn.

Name Email
Md Shahriar Iqbal [email protected]

📘   License

Unicorn is released under the under terms of the MIT License.

Comments
  • Evaluation of Source Environments

    Evaluation of Source Environments

    Need to determine the transfer learning pipeline. Determine the following: --- How good is the source modeling? --- How much update is needed? --- Explainability (what are the changes across environments) --- Experiments with different source budgets

    opened by iqbal128855 0
  • Structure Learning

    Structure Learning

    Enrich the causal models with Functional Causal Model (FCM) using CGNN and work with visualization for FCM Update causal model with Causal Interaction model and compare with CGNN. Comparison of CGNN, FCI (entropic calculation), and Causal Interaction model. If we use CGNN need to find the correct strategy - --- how to find the initial skeleton?

    opened by iqbal128855 0
  • Run MLPerf benchmark with Facebook DLRM.

    Run MLPerf benchmark with Facebook DLRM.

    Run MLPerf Benchmark with Facebook DLRM on different hardware (Jetson Xavier and TX2, Possibly on GPU cloud). Change software (RMC1, RMC2, and RMC3) and change workload (single stream, multi-stream and offline, varying number of queries for inference.)

    opened by iqbal128855 0
  • Run MLPerf benchmark with Facebook DLRM.

    Run MLPerf benchmark with Facebook DLRM.

    Run MLPerf Benchmark with Facebook DLRM on different hardware (Jetson Xavier and TX2, Possibly on GPU cloud). Change software (RMC1, RMC2, and RMC3) and change workload (single stream, multi-stream and offline, varying number of queries for inference.)

    opened by iqbal128855 0
  • Run Scalability experiments with Facebook DLRM systems.

    Run Scalability experiments with Facebook DLRM systems.

    --- Performance analysis of the Facebook DLRM systems with different configurations. Show how difficult it is to debug for misconfigurations in real-world production systems and discuss challenges. Discuss the richness in performance landscape (more complex behavior). --- Run CAUPER, BugDoc, SMAC, DeltaDebugging, Encore, and CBI on the DLRM fault dataset and evaluate using the ground truth dataset for both single and multi-objective performance faults. --- Show proof of scalability of CAUPER in Facebook DLRM system with a high number of allowable values taken by different configuration options. --- Write about the evaluation of Facebook DLRM systems. Analyze by 3 slices of latency, energy and heat.

    opened by iqbal128855 0
  • Update the ground truth datasets for each type of performance fault.

    Update the ground truth datasets for each type of performance fault.

    Update ground truth for each fault by using the configurations that provide 80% or more gain and recompute accuracy, precision, and recall with a confidence interval.

    opened by iqbal128855 0
  • Update Causal Structure Learning Algorithm.

    Update Causal Structure Learning Algorithm.

    -- Use FCI with the entropic approach to resolving edges. -- Breakdown computation efforts required for causal structure discovery, computing path causal effects, computing individual treatment effect, and measuring recommended configurations.

    opened by iqbal128855 0
  • More comparisons

    More comparisons

    | Method | Where? | When | link | |---|---|---|---| | ∆LDA | ECML | 2007 | http://pages.cs.wisc.edu/~jerryzhu/ssl/pub/rlda.pdf| |SmartConf | ASPLOS | 2018 | https://people.cs.uchicago.edu/~hankhoffmann/autoconf.pdf | | BestConfig | SoCC | 2017 | https://arxiv.org/pdf/1710.03439.pdf | | LEO | SIGARCH | 2015 | https://dl.acm.org/doi/pdf/10.1145/2786763.2694373 |

    opened by onkfotocer 0
  • Real world case study with a self-driving car system composition

    Real world case study with a self-driving car system composition

    Use Fig. 3 from here: https://www.bdti.com/InsideDSP/2017/03/14/NVIDIA to explain a real world scenario https://forums.developer.nvidia.com/t/cuda-performance-issue-on-tx2/50477 to show it works

    opened by onkfotocer 0
  • Policies for handing edge-type mismatches

    Policies for handing edge-type mismatches

    When are the policies applied?

    • bi-directed & no-edge → we get a confidence score- whichever edge direction has the highest confidence use that direction.
    • Un-directed edge & no-edge → no edge
    • Tail has a bubble and head has arrow → keep the directed edge and remove the bubble
    • No-edge & edge → edge
    • No-edge & no-edge → no-edge

    When are the policies applied?

    Bubble/un-directed edge - selection variables Bi-directed edge - hidden variables

    When are the policies applied?

    1. Case 1: Greedy-- apply the above rules at every step
      • At each iteration there is a DAG (say DAG_t, DAG_t-1, ...)
      • If there are conflicts keep the counts of how many times an edge a->b, b->a, a--/--b, appears, use the one that the max count.
    2. Case 2: Apply in the end.
    Experiment 
    opened by rahlk 0
  • How to resolve bi-directed edges and cycles in the causal graph?

    How to resolve bi-directed edges and cycles in the causal graph?

    • [ ] Randomly -- not an appropriate answer for the reviewer
    • [ ] Use FCI/FGS/PC (besides expert knowledge) which makes much looser assumptions about causal sufficiency to inform NOTEARS
    opened by rahlk 0
Releases(EuroSys2022)
Owner
AISys Lab
Artificial Intelligence and Systems Laboratory
AISys Lab
DCGAN LSGAN WGAN-GP DRAGAN PyTorch

Recommendation Our GAN based work for facial attribute editing - AttGAN. News 8 April 2019: We re-implement these GANs by Tensorflow 2! The old versio

Zhenliang He 408 Nov 30, 2022
Jigsaw Rate Severity of Toxic Comments

Jigsaw Rate Severity of Toxic Comments

Guanshuo Xu 66 Nov 30, 2022
[SIGGRAPH 2020] Attribute2Font: Creating Fonts You Want From Attributes

Attr2Font Introduction This is the official PyTorch implementation of the Attribute2Font: Creating Fonts You Want From Attributes. Paper: arXiv | Rese

Yue Gao 200 Dec 15, 2022
Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

Table of Content Introduction Getting Started Datasets Installation Experiments Training & Testing Pretrained models Texture fine-tuning Demo Toward R

VinAI Research 42 Dec 05, 2022
Fast mesh denoising with data driven normal filtering using deep variational autoencoders

Fast mesh denoising with data driven normal filtering using deep variational autoencoders This is an implementation for the paper entitled "Fast mesh

9 Dec 02, 2022
Imagededup - 😎 Finding duplicate images made easy

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

idealo 4.3k Jan 07, 2023
2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

sohu_text_matching 2021搜狐校园文本匹配算法大赛Top2:分比我们低的都是帅哥队 本repo包含了本次大赛决赛环节提交的代码文件及答辩PPT,提交的模型文件可在百度网盘获取(链接:https://pan.baidu.com/s/1T9FtwiGFZhuC8qqwXKZSNA ,

hflserdaniel 43 Oct 01, 2022
Code repo for "Transformer on a Diet" paper

Transformer on a Diet Reference: C Wang, Z Ye, A Zhang, Z Zhang, A Smola. "Transformer on a Diet". arXiv preprint arXiv (2020). Installation pip insta

cgraywang 31 Sep 26, 2021
Creative Applications of Deep Learning w/ Tensorflow

Creative Applications of Deep Learning w/ Tensorflow This repository contains lecture transcripts and homework assignments as Jupyter Notebooks for th

Parag K Mital 1.5k Dec 30, 2022
EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling This is the official implementation for "Frustratingly Simple Pretraining Al

Atsuki Yamaguchi 31 Nov 18, 2022
An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym

gym-idsgame An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym gym-idsgame is a reinforcement learning environment for simulating at

Kim Hammar 29 Dec 03, 2022
Adversarial Attacks are Reversible via Natural Supervision

Adversarial Attacks are Reversible via Natural Supervision ICCV2021 Citation @InProceedings{Mao_2021_ICCV, author = {Mao, Chengzhi and Chiquier

Computer Vision Lab at Columbia University 20 May 22, 2022
Numerai tournament example scripts using NN and optuna

numerai_NN_example Numerai tournament example scripts using pytorch NN, lightGBM and optuna https://numer.ai/tournament Performance of my model based

Takahiro Maeda 12 Oct 10, 2022
Implementation for Paper "Inverting Generative Adversarial Renderer for Face Reconstruction"

StyleGAR TODO: add arxiv link Implementation of Inverting Generative Adversarial Renderer for Face Reconstruction TODO: for test Currently, some model

155 Oct 27, 2022
[AI6101] Introduction to AI & AI Ethics is a core course of MSAI, SCSE, NTU, Singapore

[AI6101] Introduction to AI & AI Ethics is a core course of MSAI, SCSE, NTU, Singapore. The repository corresponds to the AI6101 of Semester 1, AY2021-2022, starting from 08/2021. The instructors of

AccSrd 1 Sep 22, 2022
Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

Child-Tuning Source code for EMNLP 2021 Long paper: Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning. 1. Environ

46 Dec 12, 2022
COD-Rank-Localize-and-Segment (CVPR2021)

COD-Rank-Localize-and-Segment (CVPR2021) Simultaneously Localize, Segment and Rank the Camouflaged Objects Full camouflage fixation training dataset i

JingZhang 52 Dec 20, 2022
TriMap: Large-scale Dimensionality Reduction Using Triplets

TriMap TriMap is a dimensionality reduction method that uses triplet constraints to form a low-dimensional embedding of a set of points. The triplet c

Ehsan Amid 235 Dec 24, 2022
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Enrico Fini 48 Sep 27, 2022
An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介 通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型,分别是SimpleCNN和MiniXception。利用 imdb_crop

8 Mar 11, 2022