This is my implementation on the K-nearest neighbors algorithm from scratch using Python

Overview

K Nearest Neighbors (KNN) algorithm

In this Machine Learning world, there are various algorithms designed for classification problems such as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting over Decision Trees, etc. and KNN is not exceptional. When approaching to a classification problem, off the top of my head I will come up with the algorithm (KNN) first because it may be the most basic and simple algorithm to implement. In this project, I'm going to build a KNN model to do tasks on classifying flower types in the dataset Iris.

About data

The dataset I'm going to use is a flower dataset called Iris. This is a bit of information about this dataset that you should care about:

  • The data set contains 3 classes (Iris Setosa, Iris Versicolor, Iris Virginica) of 50 instances each, where each class refers to a type of iris plant.
  • There are 4 attributes in total, which are sepal length in cm, sepal width in cm, petal length in cm, petal width in cm.
  • Classes: Iris setosa, Iris versicolor, Iris virginica

Iris species

Environment

MacOS Monterey 12.1, Anaconda Virtual Environment, Python 3.9.7 64-bit

Requirements

  1. Numpy
  2. Pandas
  3. Scikit-learn

KNN procedures

  1. Load the data

  2. Initialise the value of K

  3. For getting the predicted class, iterate from 1 to the total number of training data points

    3.1 Calculate the distance between a test sample and each row of training data. Here we will use L2 norm (Euclidean distance) as our distance metric since it’s the most popular method

    3.2 Sort the calculated distances in ascending order based on distance values

    3.3 Get top K nearest neighbors from the sorted array

    3.4 Get a class with the maximum number of votes

    3.5 Return the predicted class

How to run this program

In your project folder, open a terminal and run python knn.py

Result

An example result

Reference:

For more information about the KNN algorithm, please follow this link:

Owner
sonny1902
Every cloud has a silver lining
sonny1902
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

mlpack 4.2k Jan 01, 2023
Predict the output which should give a fair idea about the chances of admission for a student for a particular university

Predict the output which should give a fair idea about the chances of admission for a student for a particular university.

ArvindSandhu 1 Jan 11, 2022
pure-predict: Machine learning prediction in pure Python

pure-predict speeds up and slims down machine learning prediction applications. It is a foundational tool for serverless inference or small batch prediction with popular machine learning frameworks l

Ibotta 84 Dec 29, 2022
A basic Ray Tracer that exploits numpy arrays and functions to work fast.

Python-Fast-Raytracer A basic Ray Tracer that exploits numpy arrays and functions to work fast. The code is written keeping as much readability as pos

Rafael de la Fuente 393 Dec 27, 2022
An open-source library of algorithms to analyse time series in GPU and CPU.

An open-source library of algorithms to analyse time series in GPU and CPU.

Shapelets 216 Dec 30, 2022
Pragmatic AI Labs 421 Dec 31, 2022
Predicting job salaries from ads - a Kaggle competition

Predicting job salaries from ads - a Kaggle competition

Zygmunt Zając 57 Oct 23, 2020
LinearRegression2 Tvads and CarSales

LinearRegression2_Tvads_and_CarSales This project infers the insight that how the TV ads for cars and car Sales are being linked with each other. It i

Ashish Kumar Yadav 1 Dec 29, 2021
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Hivemind: decentralized deep learning in PyTorch Hivemind is a PyTorch library to train large neural networks across the Internet. Its intended usage

1.3k Jan 08, 2023
CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

Rishabh Iyer 141 Nov 10, 2022
Programming assignments and quizzes from all courses within the Machine Learning Engineering for Production (MLOps) specialization offered by deeplearning.ai

Machine Learning Engineering for Production (MLOps) Specialization on Coursera (offered by deeplearning.ai) Programming assignments from all courses i

Aman Chadha 173 Jan 05, 2023
Hierarchical Time Series Forecasting using Prophet

htsprophet Hierarchical Time Series Forecasting using Prophet Credit to Rob J. Hyndman and research partners as much of the code was developed with th

Collin Rooney 131 Dec 02, 2022
Forecasting prices using Facebook/Meta's Prophet model

CryptoForecasting using Machine and Deep learning (Part 1) CryptoForecasting using Machine Learning The main aspect of predicting the stock-related da

1 Nov 27, 2021
Python ML pipeline that showcases mltrace functionality.

mltrace tutorial Date: October 2021 This tutorial builds a training and testing pipeline for a toy ML prediction problem: to predict whether a passeng

Log Labs 28 Nov 09, 2022
Titanic Traveller Survivability Prediction

The aim of the mini project is predict whether or not a passenger survived based on attributes such as their age, sex, passenger class, where they embarked and more.

John Phillip 0 Jan 20, 2022
A Pythonic framework for threat modeling

pytm: A Pythonic framework for threat modeling Introduction Traditional threat modeling too often comes late to the party, or sometimes not at all. In

Izar Tarandach 644 Dec 20, 2022
EbookMLCB - ebook Machine Learning cơ bản

Mã nguồn cuốn ebook "Machine Learning cơ bản", Vũ Hữu Tiệp. ebook Machine Learning cơ bản pdf-black_white, pdf-color. Mọi hình thức sao chép, in ấn đề

943 Jan 02, 2023
Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

Azaria Gebremichael 2 Jul 29, 2021
Anytime Learning At Macroscale

On Anytime Learning At Macroscale Learning from sequential data dumps (key) Requirements Python 3.7 Pytorch 1.9.0 Hydra 1.1.0 (pip install hydra-core

Meta Research 8 Mar 29, 2022
machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

This is a machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service. We initially made th

Krishna Priyatham Potluri 73 Dec 01, 2022