CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

Overview

CorrProxies

Declaration

This repo is for paper: Optimizing Machine Learning Inference Queries with Correlative Proxy Models.

Setup ENV

Quick Start

  1. We provide a fully ready Docker Image ready to use out-of-box.
  2. Optionally, you can also follow the steps to build your own testing environment.

The Provided Docker Environment

Steps to run the Docker Environment

  • Get the docker image from this link.
  • Load the docker image. docker load -i corrproxies-image.tar
  • Run the docker image in a container. docker run --name=CorrProxies -i -t -d corrproxies-image
    • it will return you the docker container ID, for example d979af9a17f23345cb2894b22dc8527680acdfd7a7e1aaed6a7a28ea134e66e6.
  • Use CLI to control the container with the specific ID generated. docker exec -it d979af9a17f23345cb2894b22dc8527680acdfd7a7e1aaed6a7a28ea134e66e6 /bin/zsh

ENV Spec

File structure:

  • The home directory for CorrProxies locates at /home/CorrProxies.
  • The Python executable locates at /home/anaconda3/envs/condaenv/bin/python3.
  • The models locate at /home/CorrProxies/model.
  • The datasets locate at /home/CorrProxies/data.
  • The starting scripts locate at /home/CorrProxies/scripts.

Build Your Own Environment

This instruction is based on a clean distribution of [email protected]

  1. Install pre-requisites.

    apt-get update && apt-get install -y build-essential

  2. Install Anaconda.

    • wget https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh && bash Anaconda3-5.3.1-Linux-x86_64.sh -b -p
    • export PATH=" /bin/:$PATH"
  3. Install [email protected] with Anaconda3.

    conda create -n condaenv python=3.6.6

  4. Activate the newly installed Python ENV.

    conda activate condaenv

  5. Install dependencies with pip.

    pip3 install -r requirements.txt

  6. Install Java (openjdk-8) (for standford-nlp usage).

    apt-get install -y openjdk-8-jdk

Queries & Datasets

  • We use Twitter text dataset, COCO image dataset and UCF101 video dataset as our benchmark datasets. Please see this page for examples of detailed Queries and Datasets examples we use in our experiments.

  • After you setup the environment, either manually or using the docker image provided by us, the next step is to download the datasets.

    • To get the COCO dataset: cd /home/CorrProxies/data/image/coco && ./get_coco_dataset.sh
    • To get the UCF101 dataset: cd /home/CorrProxies/data/video/ucf101 && wget -c https://www.crcv.ucf.edu/data/UCF101/UCF101.rar && unrar x UCF101.rar.

Execution

Please pull the latest code before executing the code. Command cd /home/CorrProxies && git pull

Run Operators Individually

To run and see each operator we used in our experiment, simply execute python3 . For example: python3 operators/ml_operators/image_video_operators/video_activity_recognition.py.

Run Experiments

We use scripts/run.sh to start experiments. The script will take in command line arguments.

  • Text(Twitter)

    • Since we do not provide text dataset, we will skip the experiment.
  • Image(COCO)

    Example: ./scripts/run.sh -w 2 -t 1 -i '1' -a 0.9 -s 3 -o 2 -e 1

  • Video(UCF101)

    Example: ./scripts/run.sh -w 2 -t 2 -i '1' -a 0.9 -s 3 -o 2 -e 1

  • arguments detail.

    • w int: experiment type in [1, 2, 3, 4] referring to /home/CorrProxies/ml_workflow/exps/WorkflowExp*.py;
    • t int: query type in [0, 1, 2]. Int 0, 1, 2 means queries on the Twitter, COCO, and UCF101 datasets, respectively;
    • i int: query index in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    • a float: query accuracy;
    • s int: scheme in [0, 1, 2, 3, 4, 5, 6]. Int 0, 1, 2, 3, 4, 5, 6 means 'ORIG', 'NS', 'PP', 'CORE', 'COREa', 'COREh' and 'REORDER' schemes, respectively;
    • o int: number of threads used in optimization phase;
    • e int: number of threads used in execution phase after generating an optimized plan.
Owner
ZhihuiYangCS
ZhihuiYangCS
Arquivos do curso online sobre a estatística voltada para ciência de dados e aprendizado de máquina.

Estatistica para Ciência de Dados e Machine Learning Arquivos do curso online sobre a estatística voltada para ciência de dados e aprendizado de máqui

Renan Barbosa 1 Jan 10, 2022
Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Felix Daudi 1 Jan 06, 2022
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
Simple Machine Learning Tool Kit

Getting started smltk (Simple Machine Learning Tool Kit) package is implemented for helping your work during data preparation testing your model The g

Alessandra Bilardi 1 Dec 30, 2021
[HELP REQUESTED] Generalized Additive Models in Python

pyGAM Generalized Additive Models in Python. Documentation Official pyGAM Documentation: Read the Docs Building interpretable models with Generalized

daniel servén 747 Jan 05, 2023
Pydantic based mock data generation

This library offers powerful mock data generation capabilities for pydantic based models. It can also be used with other libraries that use pydantic as a foundation, for example SQLModel, Beanie and

Na'aman Hirschfeld 396 Dec 28, 2022
Cool Python features for machine learning that I used to be too afraid to use. Will be updated as I have more time / learn more.

python-is-cool A gentle guide to the Python features that I didn't know existed or was too afraid to use. This will be updated as I learn more and bec

Chip Huyen 3.3k Jan 05, 2023
虚拟货币(BTC、ETH)炒币量化系统项目。在一版本的基础上加入了趋势判断

🎉 第二版本 🎉 (现货趋势网格) 介绍 在第一版本的基础上 趋势判断,不在固定点位开单,选择更优的开仓点位 优势: 🎉 简单易上手 安全(不用将api_secret告诉他人) 如何启动 修改app目录下的authorization文件

幸福村的码农 250 Jan 07, 2023
Winning solution for the Galaxy Challenge on Kaggle

Winning solution for the Galaxy Challenge on Kaggle

Sander Dieleman 483 Jan 02, 2023
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

Target 696 Dec 26, 2022
healthy and lesion models for learning based on the joint estimation of stochasticity and volatility

health-lesion-stovol healthy and lesion models for learning based on the joint estimation of stochasticity and volatility Reference please cite this p

5 Nov 01, 2022
slim-python is a package to learn customized scoring systems for decision-making problems.

slim-python is a package to learn customized scoring systems for decision-making problems. These are simple decision aids that let users make yes-no p

Berk Ustun 37 Nov 02, 2022
Basic Docker Compose for Machine Learning Purposes

Docker-compose for Machine Learning How to use: cd docker-ml-jupyterlab

Chris Chen 1 Oct 29, 2021
Transform ML models into a native code with zero dependencies

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code

Bayes' Witnesses 2.3k Jan 03, 2023
This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

1 Nov 03, 2021
This repo implements a Topological SLAM: Deep Visual Odometry with Long Term Place Recognition (Loop Closure Detection)

This repo implements a topological SLAM system. Deep Visual Odometry (DF-VO) and Visual Place Recognition are combined to form the topological SLAM system.

Best of Australian Centre for Robotic Vision (ACRV) 32 Jun 23, 2022
icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

icepickle It's a cooler way to store simple linear models. The goal of icepickle is to allow a safe way to serialize and deserialize linear scikit-lea

vincent d warmerdam 24 Dec 09, 2022
Distributed deep learning on Hadoop and Spark clusters.

Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version

Yahoo 1.3k Dec 28, 2022
MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

SUPSI-DACD-ISAAC 61 Dec 19, 2022
MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training

MosaicML Composer MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training. We aim to ease th

MosaicML 2.8k Jan 06, 2023