Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?

Overview

END TO END MACHINE LEARNING PROJECT ON HITTERS DATASET

Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?

DATA SET STORY:

  • This dataset was originally taken from the StatLib library at Carnegie Mellon University.
  • This is part of the data that was used in the 1988 ASA Graphics Section Poster Session.
  • The salary data were originally from Sports Illustrated, April 20, 1987.
  • The 1986 and career statistics were obtained from The 1987 Baseball Encyclopedia Update published by Collier Books, Macmillan Publishing Company, New York.

ATTRIBUTES: A data frame with 322 observations of major league players on the following 20 variables.

  • AtBat: Number of times at bat in 1986-1987 season
  • Hits: Number of hits in 1986-1987 season
  • HmRun: Number of home runs in 1986-1987 season
  • Runs: Number of runs in 1986-1987 season
  • RBI: Number of runs batted in 1986-1987 season
  • Walks: Number of walks in 1986-1987 season
  • Years: Number of years in the major leagues
  • CAtBat: Number of times at bat during his career
  • CHits: Number of hits during his career
  • CHmRun: Number of home runs during his career
  • CRuns: Number of runs during his career
  • CRBI: Number of runs batted in during his career
  • CWalks: Number of walks during his career
  • League: A factor with levels A and N indicating player's league at the end of 1986
  • Division: A factor with levels E and W indicating player's division at the end of 1986
  • PutOuts: Number of put outs in 1986-1987 season
  • Assists: Number of assists in 1986-1987 season
  • Errors: Number of errors in 1986-1987 season
  • Salary: 1996-1987 annual salary on opening day in thousands of dollars
  • NewLeague: A factor with levels A and N indicating player's league at the beginning of 1987
Owner
Pinar Oner
Data Engineer | Project Coordinator
Pinar Oner
A linear regression model for house price prediction

Linear_Regression_Model A linear regression model for house price prediction. This code is using these packages, so please make sure your have install

ShawnWang 1 Nov 29, 2021
🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

Real-time water systems lab 416 Jan 06, 2023
Formulae is a Python library that implements Wilkinson's formulas for mixed-effects models.

formulae formulae is a Python library that implements Wilkinson's formulas for mixed-effects models. The main difference with other implementations li

34 Dec 21, 2022
Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

Multiple-Linear-Regression-master - A python program to implement Multiple Linear Regression using the LinearRegression class from sklearn.linear model library

Kushal Shingote 1 Feb 06, 2022
Open MLOps - A Production-focused Open-Source Machine Learning Framework

Open MLOps - A Production-focused Open-Source Machine Learning Framework Open MLOps is a set of open-source tools carefully chosen to ease user experi

Data Revenue 590 Dec 28, 2022
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

TensorFrames (Deprecated) Note: TensorFrames is deprecated. You can use pandas UDF instead. Experimental TensorFlow binding for Scala and Apache Spark

Databricks 757 Dec 31, 2022
Decision tree is the most powerful and popular tool for classification and prediction

Diabetes Prediction Using Decision Tree Introduction Decision tree is the most powerful and popular tool for classification and prediction. A Decision

Arjun U 1 Jan 23, 2022
Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort

Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort

2.3k Jan 04, 2023
A simple python program that draws a tree for incrementing values using the Collatz Conjecture.

Collatz Conjecture A simple python program that draws a tree for incrementing values using the Collatz Conjecture. Values which can be edited: Length

davidgasinski 1 Oct 28, 2021
Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

1 Aug 06, 2022
Reggy - Regressions with arbitrarily complex regularization terms

reggy Regressions with arbitrarily complex regularization terms. Currently suppo

Kim 1 Jan 20, 2022
Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

Christoph Molnar 326 Jan 02, 2023
This project has Classification and Clustering done Via kNN and K-Means respectfully

This project has Classification and Clustering done Via kNN and K-Means respectfully. It later tests its efficiency via F1/accuracy/recall/precision for kNN and Davies-Bouldin Index for Clustering. T

Mohammad Ali Mustafa 0 Jan 20, 2022
This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

1 Nov 03, 2021
Simple data balancing baselines for worst-group-accuracy benchmarks.

BalancingGroups Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy. Replicating

Facebook Research 29 Dec 02, 2022
Warren - Stock Price Predictor

Web app to predict closing stock prices in real time using Facebook's Prophet time series algorithm with a multi-variate, single-step time series forecasting strategy.

Kumar Nityan Suman 153 Jan 03, 2023
Laporan Proyek Machine Learning - Azhar Rizki Zulma

Laporan Proyek Machine Learning - Azhar Rizki Zulma Project Overview Domain proyek yang dipilih dalam proyek machine learning ini adalah mengenai hibu

Azhar Rizki Zulma 6 Mar 12, 2022
This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

B DEVA DEEKSHITH 1 Nov 03, 2021
🎛 Distributed machine learning made simple.

🎛 lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight

Machine Learning Tooling 44 Nov 27, 2022
Predicting diabetes over a five year period using logistic regression and the Pima First-Nation dataset

Diabetes This script uses the Pima First Nations dataset to create a model to predict whether or not an individual will develop Diabetes Mellitus Type

1 Mar 28, 2022