K-means clustering is a method used for clustering analysis, especially in data mining and statistics.

Overview

K Means Algorithm

What is K Means

This algorithm is an iterative algorithm that partitions the dataset according to their features into K number of predefined non- overlapping distinct clusters or subgroups. It makes the data points of inter clusters as similar as possible and also tries to keep the clusters as far as possible. It allocates the data points to a cluster if the sum of the squared distance between the cluster’s centroid and the data points is at a minimum, where the cluster’s centroid is the arithmetic mean of the data points that are in the cluster. A less variation in the cluster results in similar or homogeneous data points within the cluster.

Sources :

  1. Nvidia
  2. Wikipedia

How K Means works

  1. Specify number of clusters K.
  2. Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement.
  3. Keep iterating until there is no change to the centroids. i.e assignment of data points to clusters isn’t changing.
  4. Compute the euclidean distance
  5. Assign each data point to the closest cluster (centroid).
  6. Compute the centroids for the clusters by taking the average of the all data points that belong to each cluster.

Flow Chart

picture alt


K Means in action

2D:

picture alt

3D:

picture alt

In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

SKlearn_to_MLFLow In this Repo a simple Sklearn Model will be trained and pushed to MLFlow Install This Repo is based on poetry python3 -m venv .venv

1 Dec 13, 2021
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

Target 696 Dec 26, 2022
Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis.

Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis. It is distributed under the MIT License.

Jeong-Yoon Lee 720 Dec 25, 2022
Learn how to responsibly deliver value with ML.

Made With ML Applied ML · MLOps · Production Join 30K+ developers in learning how to responsibly deliver value with ML. 🔥 Among the top MLOps reposit

Goku Mohandas 32k Dec 30, 2022
A simple and lightweight genetic algorithm for optimization of any machine learning model

geneticml This package contains a simple and lightweight genetic algorithm for optimization of any machine learning model. Installation Use pip to ins

Allan Barcelos 8 Aug 10, 2022
Forecast dynamically at scale with this unique package. pip install scalecast

🌄 Scalecast: Dynamic Forecasting at Scale About This package uses a scaleable forecasting approach in Python with common scikit-learn and statsmodels

Michael Keith 158 Jan 03, 2023
Simple and flexible ML workflow engine.

This is a simple and flexible ML workflow engine. It helps to orchestrate events across a set of microservices and create executable flow to handle requests. Engine is designed to be configurable wit

Katana ML 295 Jan 06, 2023
PyNNDescent is a Python nearest neighbor descent for approximate nearest neighbors.

PyNNDescent PyNNDescent is a Python nearest neighbor descent for approximate nearest neighbors. It provides a python implementation of Nearest Neighbo

Leland McInnes 699 Jan 09, 2023
Python package for stacking (machine learning technique)

vecstack Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API Convenient wa

Igor Ivanov 671 Dec 25, 2022
A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

modAL 1.9k Dec 31, 2022
Tools for diffing and merging of Jupyter notebooks.

nbdime provides tools for diffing and merging of Jupyter Notebooks.

Project Jupyter 2.3k Jan 03, 2023
Nixtla is an open-source time series forecasting library.

Nixtla Nixtla is an open-source time series forecasting library. We are helping data scientists and developers to have access to open source state-of-

Nixtla 401 Jan 08, 2023
Python Machine Learning Jupyter Notebooks (ML website)

Python Machine Learning Jupyter Notebooks (ML website) Dr. Tirthajyoti Sarkar, Fremont, California (Please feel free to connect on LinkedIn here) Also

Tirthajyoti Sarkar 2.6k Jan 03, 2023
Price forecasting of SGB and IRFC Bonds and comparing there returns

Project_Bonds Project Title : Price forecasting of SGB and IRFC Bonds and comparing there returns. Introduction of the Project The 2008-09 global fina

Tishya S 1 Oct 28, 2021
ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

A library for debugging/inspecting machine learning classifiers and explaining their predictions

154 Dec 17, 2022
Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogramas anuais com spark, em pyspark e SQL!

Olá! Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogr

Henrique de Paula 10 Apr 04, 2022
A Tools that help Data Scientists and ML engineers train and deploy ML models.

Domino Research This repo contains projects under active development by the Domino R&D team. We build tools that help Data Scientists and ML engineers

Domino Data Lab 73 Oct 17, 2022
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

An open-source, low-code machine learning library in Python 🚀 Version 2.3.5 out now! Check out the release notes here. Official • Docs • Install • Tu

PyCaret 6.7k Jan 08, 2023
CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system

CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system

Zelros 67 Dec 28, 2022
A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching. The solver will solve equations of the type: A can be

Sanjeet N. Dasharath 3 Feb 15, 2022