This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Last update: Jan 09, 2022

Zillow-Houses

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Pipeline is consists of 10 general steps

Exploratory Data Analysis (Univariate, Bivariate, Hypothesis testing, Confident Interals)
Missing values (different advanced and not strategies to impute: MICE algo with the using of gradient boosting, lightgbm etc.)
Duplicate checking
Advanced Anomaly Detection (models such as KNN, Isolation Forests, and final detector witch aggregates results from base models - SUOD)
Multicollinearity problem solving
Feature Engineering
Feature Transformation of some features with hypothesis testing on it (fitting distributions with some statistical tests)
Advanced Feature Selection and not - Recursive Feature Elimination with cross-validation on different tree-based models such as Gradient Boosting, Random Forests etc) and of course Lasso with L1-norm, Feature Importances of trees and combine them into one algorithm witch takes in account all the above method
Modeling (different regression models, fine-tuning, learning curves, validation curves, Residuals Analysis etc.). Later, i wan't to use some stacking stategies on boosted trees and some NN models
Results analysis: best model selection with the using of confident intervals and different non-parametric statistical tests etc.

This solution also contains custom preprocessing pipeline witch automaticly can do 2-8 steps ( all in :) )

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.