Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared

Last update: Apr 21, 2022

Related tags

Machine Learning Feature-Engineering

Overview

Feature-Engineering

Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared.

When the dataset is passed through this script, the modeling starts. expected to be ready.

Dataset Story

The data set is the data set of the people who were in the Titanic shipwreck. It consists of 768 observations and 12 variables. The target variable is specified as "Survived"; 1: one's survival, 0: indicates the person's inability to survive.

Variables

PassengerId: ID of the passenger

Survived: Survival status (0: not survived, 1: survived)
Pclass: Ticket class (1: 1st class (upper), 2: 2nd class (middle), 3: 3rd class(lower))
Name: Name of the passenger
Sex: Gender of the passenger (male, female)
Age: Age in years
Sibsp: Number of siblings/spouses aboard the Titanic
- Sibling = Brother, sister, stepbrother, stepsister
- Spouse = Husband, wife (mistresses and fiances were ignored) Parch: Number of parents/children aboard the Titanic
- Parent = Mother, father
- Child = Daughter, son, stepdaughter, stepson
- Some children travelled only with a nanny , therefore Parch = 0 for them.
Ticket: Ticket number
Fare: Passenger fare
Cabin: Cabin number
Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)

REFERENCE: Data Science and ML Boot Camp, 2021, Veri Bilimi Okulu (https://www.veribilimiokulu.com/)

Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared

Related tags

Overview

Feature-Engineering

Dataset Story

Variables

Owner

kemalgunay

Adaptive: parallel active learning of mathematical functions

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

A Time Series Library for Apache Spark

Mesh TensorFlow: Model Parallelism Made Easier

PyHarmonize: Adding harmony lines to recorded melodies in Python

Tools for mathematical optimization region

The code from the Machine Learning Bookcamp book and a free course based on the book

Projeto: Machine Learning: Linguagens de Programacao 2004-2001

Diabetes Prediction with Logistic Regression

Relevance Vector Machine implementation using the scikit-learn API.

Open MLOps - A Production-focused Open-Source Machine Learning Framework

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Short PhD seminar on Machine Learning Security (Adversarial Machine Learning)

learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

Management of exclusive GPU access for distributed machine learning workloads

Open-Source CI/CD platform for ML teams. Deliver ML products, better & faster. ⚡️🧑‍🔧

Implementation of K-Nearest Neighbors Algorithm Using PySpark