A wrapper around SageMaker ML Lineage Tracking extending ML Lineage to end-to-end ML lifecycles, including additional capabilities around Feature Store groups, queries, and other relevant artifacts.

Overview

ML Lineage Helper

This library is a wrapper around the SageMaker SDK to support ease of lineage tracking across the ML lifecycle. Lineage artifacts include data, code, feature groups, features in a feature group, feature group queries, training jobs, and models.

Install

pip install git+https://github.com/aws-samples/ml-lineage-helper

Usage

Import ml_lineage_helper.

from ml_lineage_helper import *
from ml_lineage_helper.query_lineage import QueryLineage

Creating and Displaying ML Lineage

Lineage tracking can tie together a SageMaker Processing job, the raw data being processed, the processing code, the query you used against the Feature Store to fetch your training and test sets, the training and test data in S3, and the training code into a lineage represented as a DAG.

ml_lineage = MLLineageHelper()
lineage = ml_lineage.create_ml_lineage(estimator_or_training_job_name, model_name=model_name,
                                       query=query, sagemaker_processing_job_description=preprocessing_job_description,
                                       feature_group_names=['customers', 'claims'])
lineage

If you cloned your code from a version control hosting platform like GitHub or GitLab, ml_lineage_tracking can associate the URLs of the code with the artifacts that will be created. See below:

# Get repo links to processing and training code
processing_code_repo_url = get_repo_link(os.getcwd(), 'processing.py')
training_code_repo_url = get_repo_link(os.getcwd(), 'pytorch-model/train_deploy.py', processing_code=False)
repo_links = [processing_code_repo_url, training_code_repo_url]

# Create lineage
ml_lineage = MLLineageHelper()
lineage = ml_lineage.create_ml_lineage(estimator, model_name=model_name,
                                       query=query, sagemaker_processing_job_description=preprocessing_job_description,
                                       feature_group_names=['customers', 'claims'],
                                       repo_links=repo_links)
lineage
Name/Source Association Name/Destination Artifact Source ARN Artifact Destination ARN Source URI Base64 Feature Store Query String Git URL
pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job Produced Model arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job arn:aws:sagemaker:us-west-2:000000000000:artifact/013fa1be4ec1d192dac21abaf94ddded None None None
TrainingCode ContributedTo pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job arn:aws:sagemaker:us-west-2:000000000000:artifact/902d23ff64ef6d85dc27d841a967cd7d arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job s3://sagemaker-us-west-2-000000000000/pytorch-hosted-model-2021-08-26-15-55-22-071/source/sourcedir.tar.gz None https://gitlab.com/bwlind/ml-lineage-tracking/blob/main/ml-lineage-tracking/pytorch-model/train_deploy.py
TestingData ContributedTo pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job arn:aws:sagemaker:us-west-2:000000000000:artifact/1ae9dfab7a3817cbf14708d932d9142d arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job s3://sagemaker-us-west-2-000000000000/ml-lineage-tracking-v1/test.npy None None
TrainingData ContributedTo pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job arn:aws:sagemaker:us-west-2:000000000000:artifact/a0fd47c730f883b8e5228577fc5d5ef4 arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job s3://sagemaker-us-west-2-000000000000/ml-lineage-tracking-v1/train.npy CnNlbGVjdCAqCmZyb20gImJvc3Rvbi1ob3VzaW5nLXY1LTE2Mjk3MzEyNjkiCg== None
fg-boston-housing-v5 ContributedTo TestingData arn:aws:sagemaker:us-west-2:000000000000:artifact/1969cb21bf48405e0f2bb2d33f48b7b2 arn:aws:sagemaker:us-west-2:000000000000:artifact/1ae9dfab7a3817cbf14708d932d9142d arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing-v5 None None
fg-boston-housing ContributedTo TestingData arn:aws:sagemaker:us-west-2:000000000000:artifact/d1b82165341cd78b93995d492b5adf7f arn:aws:sagemaker:us-west-2:000000000000:artifact/1ae9dfab7a3817cbf14708d932d9142d arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing None None
ProcessingJob ContributedTo fg-boston-housing-v5 arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f arn:aws:sagemaker:us-west-2:000000000000:artifact/1969cb21bf48405e0f2bb2d33f48b7b2 arn:aws:sagemaker:us-west-2:000000000000:processing-job/pytorch-workflow-preprocessing-26-15-41-18 None None
ProcessingInputData ContributedTo ProcessingJob arn:aws:sagemaker:us-west-2:000000000000:artifact/2204290e557c4c9feaaa4ef7e4d88f0c arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f s3://sagemaker-us-west-2-000000000000/ml-lineage-tracking-v1/data/raw None None
ProcessingCode ContributedTo ProcessingJob arn:aws:sagemaker:us-west-2:000000000000:artifact/69de4723ab0643c6ca8257bc6fbcfb4f arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f s3://sagemaker-us-west-2-000000000000/pytorch-workflow-preprocessing-26-15-41-18/input/code/preprocessing.py None https://gitlab.com/bwlind/ml-lineage-tracking/blob/main/ml-lineage-tracking/processing.py
ProcessingJob ContributedTo fg-boston-housing arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f arn:aws:sagemaker:us-west-2:000000000000:artifact/d1b82165341cd78b93995d492b5adf7f arn:aws:sagemaker:us-west-2:000000000000:processing-job/pytorch-workflow-preprocessing-26-15-41-18 None None
fg-boston-housing-v5 ContributedTo TrainingData arn:aws:sagemaker:us-west-2:000000000000:artifact/1969cb21bf48405e0f2bb2d33f48b7b2 arn:aws:sagemaker:us-west-2:000000000000:artifact/a0fd47c730f883b8e5228577fc5d5ef4 arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing-v5 None None
fg-boston-housing ContributedTo TrainingData arn:aws:sagemaker:us-west-2:000000000000:artifact/d1b82165341cd78b93995d492b5adf7f arn:aws:sagemaker:us-west-2:000000000000:artifact/a0fd47c730f883b8e5228577fc5d5ef4 arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing None None

You can optionally see the lineage represented as a graph instead of a Pandas DataFrame:

ml_lineage.graph()

If you're jumping in a notebook fresh and already have a model whose ML Lineage has been tracked, you can get this MLLineage object by using the following line of code:

ml_lineage = MLLineageHelper(sagemaker_model_name_or_model_s3_uri='my-sagemaker-model-name')
ml_lineage.df

Querying ML Lineage

If you have a data source, you can find associated Feature Groups by providing the data source's S3 URI or Artifact ARN:

query_lineage = QueryLineage()
query_lineage.get_feature_groups_from_data_source(artifact_arn_or_s3_uri)

You can also start with a Feature Group, and find associated data sources:

query_lineage = QueryLineage()
query_lineage.get_data_sources_from_feature_group(artifact_or_fg_arn, max_depth=3)

Given a Feature Group, you can also find associated models:

query_lineage = QueryLineage()
query_lineage.get_models_from_feature_group(artifact_or_fg_arn)

Given a SageMaker model name or artifact ARN, you can find associated Feature Groups.

query_lineage = QueryLineage()
query_lineage.get_feature_groups_from_model(artifact_arn_or_model_name)

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Owner
AWS Samples
AWS Samples
PyTorch implementation of Asymmetric Siamese (https://arxiv.org/abs/2204.00613)

Asym-Siam: On the Importance of Asymmetry for Siamese Representation Learning This is a PyTorch implementation of the Asym-Siam paper, CVPR 2022: @inp

Meta Research 89 Dec 18, 2022
Implementation of the Chamfer Distance as a module for pyTorch

Chamfer Distance for pyTorch This is an implementation of the Chamfer Distance as a module for pyTorch. It is written as a custom C++/CUDA extension.

Christian Diller 205 Jan 05, 2023
Centroid-UNet is deep neural network model to detect centroids from satellite images.

Centroid UNet - Locating Object Centroids in Aerial/Serial Images Introduction Centroid-UNet is deep neural network model to detect centroids from Aer

GIC-AIT 19 Dec 08, 2022
This is the code related to "Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation" (ICCV 2021).

Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation This is the code relat

39 Sep 23, 2022
The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color Overview Code and dataset for The World of an Octopus: H

1 Nov 13, 2021
Turning SymPy expressions into PyTorch modules.

sympytorch A micro-library as a convenience for turning SymPy expressions into PyTorch Modules. All SymPy floats become trainable parameters. All SymP

Patrick Kidger 89 Dec 13, 2022
A Kaggle competition: discriminate gender based on handwriting

Gender discrimination based on handwriting See http://fastml.com/gender-discrimination/ for description. prep_data.py - a first step chunk_by_authors.

Zygmunt Zając 22 Jul 20, 2022
Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

UncertaintyAwareCycleConsistency This repository provides the building blocks and the API for the work presented in the NeurIPS'21 paper Robustness vi

EML Tübingen 19 Dec 12, 2022
Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

YOLOv5-GUI 🎉 YOLOv5算法(ver.6及ver.5)的Qt-GUI实现 🎉 Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5). 基于YOLOv5的v5版本和v6版本及Javacr大佬的UI逻辑进行编写

EricFang 12 Dec 28, 2022
IMBENS: class-imbalanced ensemble learning in Python.

IMBENS: class-imbalanced ensemble learning in Python. Links: [Documentation] [Gallery] [PyPI] [Changelog] [Source] [Download] [知乎/Zhihu] [中文README] [a

Zhining Liu 176 Jan 04, 2023
Vikrant Deshpande 1 Nov 17, 2022
A Python type explainer!

typesplainer A Python typehint explainer! Available as a cli, as a website, as a vscode extension, as a vim extension Usage First, install the package

Typesplainer 79 Dec 01, 2022
[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Towards Understanding and Mitigating Social Biases in Language Models This repo contains code and data for evaluating and mitigating bias from generat

Paul Liang 42 Jan 03, 2023
Luminaire is a python package that provides ML driven solutions for monitoring time series data.

A hands-off Anomaly Detection Library Table of contents What is Luminaire Quick Start Time Series Outlier Detection Workflow Anomaly Detection for Hig

Zillow 670 Jan 02, 2023
Code, final versions, and information on the Sparkfun Graphical Datasheets

Graphical Datasheets Code, final versions, and information on the SparkFun Graphical Datasheets. Generated Cells After Running Script Example Complete

SparkFun Electronics 102 Jan 05, 2023
Examples of using f2py to get high-speed Fortran integrated with Python easily

f2py Examples Simple examples of using f2py to get high-speed Fortran integrated with Python easily. These examples are also useful to troubleshoot pr

Michael 35 Aug 21, 2022
An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

MixHop and N-GCN ⠀ A PyTorch implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019)

Benedek Rozemberczki 393 Dec 13, 2022
A collection of awesome resources image-to-image translation.

awesome image-to-image translation A collection of resources on image-to-image translation. Contributing If you think I have missed out on something (

876 Dec 28, 2022
A dual benchmarking study of visual forgery and visual forensics techniques

A dual benchmarking study of facial forgery and facial forensics In recent years, visual forgery has reached a level of sophistication that humans can

8 Jul 06, 2022