MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

Overview

The collaboration platform for Machine Learning

MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.


MLReef

MLReef is a ML/DL development platform containing four main sections:

  • Data-Management - Fully versioned data hosting and processing infrastructure
  • Publishing code repositories - Containerized and versioned script repositories for immutable use in data pipelines
  • Experiment Manager - Experiment tracking, environments and results
  • ML-Ops - Pipelines & Orchestration solution for ML/DL jobs (K8s / Cloud / bare-metal)


To find out more about how MLReef can streamline your Machine Learning Development Lifecycle visit our homepage

Data Management

  • Host your data using git / git LFS repositories.
    • Work concurrently on data
    • Fully versioned or LFS version control
    • Full view on data processing and visualization history
  • Connect your external storage to MLReef and use your data directly in pipelines
  • Data set management (access, history, pipelines)

Publishing Code

Adding only parameter annotations to your code...

# example of parameter annotation for a image crop function
 @data_processor(
        name="Resnet50",
        author="MLReef",
        command="resnet50",
        type="ALGORITHM",
        description="CNN Model resnet50",
        visibility="PUBLIC",
        input_type="IMAGE",
        output_type="MODEL"
    )
    @parameter(name='input-path', type='str', required=True, defaultValue='train', description="input path")
    @parameter(name='output-path', type='str', required=True, defaultValue='output', description="output path")
    @parameter(name='height', type='int', required=True, defaultValue=224, description="height of cropped images in px")
    @parameter(name='width', type='int', required=True, defaultValue=224, description="width of cropped images in px")
    def init_params():
        pass

...and publishing your scripts gets you the following:

  • Containerization of your scripts
    • Always working scripts including easy hyperparameter access in pipelines
    • Execution environment (including specific packages & versions)
    • Hyper-parameters
      • ArgParser for command line parameters with currently used values
      • Explicit parameters dictionary
      • Input validation and guides
  • Multiple containers based on version and code branches

Experiment Manager

  • Complete experiment setup log
    • Full source control info including non-committed local changes
    • Execution environment (including specific packages & versions)
    • Hyper-parameters
  • Full experiment output automatic capture
    • Artifacts storage and standard-output logs
    • Performance metrics on individual experiments and comparative graphs for all experiments
    • Detailed view on logs and outputs generated
  • Extensive platform support and integrations

ML-Ops

  • Concurrent computing pipelining
  • Governance and control
    • Access and user management
    • Single permission management
    • Resource management
  • Model management

MLReef Architecture

The MLReef ML components within the ML life cycle:

  • Data Storage components based currently on Git and Git LFS.
  • Model development based on working modules (published by the community or your team), data management, data processing / data visualization / experiment pipeline on hosted or on-prem and model management.
  • ML-Ops orchestration, experiment and workflow reproducibility, and scalability.

Why MLReef?

MLReef is our solution to a problem we share with countless other researchers and developers in the machine learning/deep learning universe: Training production-grade deep learning models is a tangled process. MLReef tracks and controls the process by associating code version control, research projects, performance metrics, and model provenance.

We designed MLReef on best data science practices combined with the knowleged gained from DevOps and a deep focus on collaboration.

  • Use it on a daily basis to boost collaboration and visibility in your team
  • Create a job in the cloud from any code repository with a click of a button
  • Automate processes and create pipelines to collect your experimentation logs, outputs, and data
  • Make you ML life cycle transparent by cataloging it all on the MLReef platform

Getting Started as a Developer

To start developing, continue with the developer guide

Canonical source

The canonical source of MLReef where all development takes place is hosted on gitLab.com/mlreef/mlreef.

License

MIT License (see the License for more information)

Documentation, Community and Support

More information in the official documentation and on Youtube.

For examples and use cases, check these use cases or start the tutorial after registring:

If you have any questions: post on our Slack channel, or tag your questions on stackoverflow with 'mlreef' tag.

For feature requests or bug reports, please use GitLab issues.

Additionally, you can always reach out to us via [email protected]

Contributing

Merge Requests are always welcomed ❤️ See more details in the MLReef Contribution Guidelines.

Owner
MLReef
Your entire Machine Learning life cycle in one platform.
MLReef
My capstone project for Udacity's Machine Learning Nanodegree

MLND-Capstone My capstone project for Udacity's Machine Learning Nanodegree Lane Detection with Deep Learning In this project, I use a deep learning-b

Michael Virgo 407 Dec 12, 2022
ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in

Computational Data Science Lab 182 Dec 31, 2022
This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev

MLProject_01 This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev Context Dataset English question data set file F

Hadi Nakhi 1 Dec 18, 2021
Programming assignments and quizzes from all courses within the Machine Learning Engineering for Production (MLOps) specialization offered by deeplearning.ai

Machine Learning Engineering for Production (MLOps) Specialization on Coursera (offered by deeplearning.ai) Programming assignments from all courses i

Aman Chadha 173 Jan 05, 2023
Simulation of early COVID-19 using SIR model and variants (SEIR ...).

COVID-19-simulation Simulation of early COVID-19 using SIR model and variants (SEIR ...). Made by the Laboratory of Sustainable Life Assessment (GYRO)

José Paulo Pereira das Dores Savioli 1 Nov 17, 2021
Visualize classified time series data with interactive Sankey plots in Google Earth Engine

sankee Visualize changes in classified time series data with interactive Sankey plots in Google Earth Engine Contents Description Installation Using P

Aaron Zuspan 76 Dec 15, 2022
Decision tree is the most powerful and popular tool for classification and prediction

Diabetes Prediction Using Decision Tree Introduction Decision tree is the most powerful and popular tool for classification and prediction. A Decision

Arjun U 1 Jan 23, 2022
🤖 ⚡ scikit-learn tips

🤖 ⚡ scikit-learn tips New tips are posted on LinkedIn, Twitter, and Facebook. 👉 Sign up to receive 2 video tips by email every week! 👈 List of all

Kevin Markham 1.6k Jan 03, 2023
Free MLOps course from DataTalks.Club

MLOps Zoomcamp Our MLOps Zoomcamp course Sign up here: https://airtable.com/shrCb8y6eTbPKwSTL (it's not automated, you will not receive an email immed

DataTalksClub 4.6k Dec 31, 2022
Hierarchical Time Series Forecasting using Prophet

htsprophet Hierarchical Time Series Forecasting using Prophet Credit to Rob J. Hyndman and research partners as much of the code was developed with th

Collin Rooney 131 Dec 02, 2022
Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

Neighbourhood Retrieval with Distance Correlation Assign Pseudo class labels to datapoints in the latent space. NNDC is a slim wrapper around FAISS. N

The Learning Machines 1 Jan 16, 2022
A logistic regression model for health insurance purchasing prediction

Logistic_Regression_Model A logistic regression model for health insurance purchasing prediction This code is using these packages, so please make sur

ShawnWang 1 Nov 29, 2021
A Python implementation of GRAIL, a generic framework to learn compact time series representations.

GRAIL A Python implementation of GRAIL, a generic framework to learn compact time series representations. Requirements Python 3.6+ numpy scipy tslearn

3 Nov 24, 2021
Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Simple but maybe too simple config management through python data classes. We use it for machine learning.

Eren Gölge 67 Nov 29, 2022
CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL)

CyLP CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL). CyLP’s unique feature is that you can use i

COIN-OR Foundation 161 Dec 14, 2022
Crunchdao - Python API for the Crunchdao machine learning tournament

Python API for the Crunchdao machine learning tournament Interact with the Crunc

3 Jan 19, 2022
Predict the income for each percentile of the population (Python) - FRENCH

05.income-prediction Predict the income for each percentile of the population (Python) - FRENCH Effectuez une prédiction de revenus Prérequis Pour ce

1 Feb 13, 2022
End to End toy example of MLOps

churn_model MLOps Toy Example End to End You might find below links useful Connect VSCode to Git MLFlow Port Heroku App Project Organization ├── LICEN

Ashish Tele 6 Feb 06, 2022
The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it inside a loop of Design, Model Development and Operations.

MLOps The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it insid

Maykon Schots 25 Nov 27, 2022
Microsoft 5.6k Jan 07, 2023