Introduction to Statistics and Basics of Mathematics for Data Science - The Hacker's Way

Overview

HackerMath for Machine Learning

“Study hard what interests you the most in the most undisciplined, irreverent and original manner possible.” ― Richard Feynman

Math literacy, including proficiency in Linear Algebra and Statistics,is a must for anyone pursuing a career in data science. The goal of this workshop is to introduce some key concepts from these domains that get used repeatedly in data science applications. Our approach is what we call the “Hacker’s way”. Instead of going back to formulae and proofs, we teach the concepts by writing code. And in practical applications. Concepts don’t remain sticky if the usage is never taught.

The focus will be on depth rather than breadth. Three areas are chosen - Hypothesis Testing, Supervised Learning and Unsupervised Learning. They will be covered to sufficient depth - 50% of the time will be on the concepts and 50% of the time will be spent coding them.

More details at http://amitkaps.com/hackermath

See it in action: Binder

Module #1: Hypothesis Testing

Math Concepts

  • Basic Metrics: Mean, Variance, Covariance, Correlation
  • Discrete Probability Distributions: Bernoulli, Binomial
  • Cumulative Mass Function, Probability Mass Function
  • Continuous Probability Distributions: Poisson, Uniform, Normal, Beta, Gamma
  • Cumulative Distribution Function, Probability Density Function

ML Applications

  • Direct Simulation
  • Shuffling
  • Bootstrapping
  • Application to A/B Testing

Module #2: Supervised Learning

Math Concepts

  • Basics of Matrix Operation
  • Matrix Determinant, Inverse
  • Basics of Linear Algebra
  • Solve for Ax=b for nxn
  • Solve for Ax=b for nxp+1

ML Applications

  • Linear Regression
  • L2 Regularization
  • Gradient Descent
  • Linear Classifier
  • Logistic Regression

Module #3: Unsupervised Learning

Math Concepts

  • Matrix Projections
  • Solve for Ax=λx for nxn
  • Eigenvectors & Eigenvalues
  • Distance in Vector Space

ML Applications

  • Dimensionality Reduction
  • Principle Component Analysis
  • Cluster Analysis

Target Audience

  • Someone with a background in programming who wants to pick the math needed for data science and get a flavor for different data science problems
  • Someone who is a beginner in data science or has been doing data analysis (at least using Excel at a minimum) and wants to pick skills to take the next step in their data science career

Pre-requisites

  • Having a basic understanding of linear algebra would help. And we know you may have forgotten all about it from your school or college days. So here is an amazing video playlist by @3blue1brown to learn The Essence of Linear Algebra in a very visual way.
  • Also, a touch of calculus knowledge would make it also easier. So if you want to brush up your basic calculus skills, then @3blue1brown has another amazing video playlist to learn The Essence of Calculus in a very visual way.
  • Programming knowledge is mandatory. You should, at the bare minimum, be able to write conditional statements, use loops, be comfortable writing functions and be able to understand code snippets and come up with programming logic. Since we will be using Python - brush up your basics there. Specifically, we expect you to know the first three sections from this: http://anandology.com/python-practice-book/

Software Requirements

You will require the Python data stack for the workshop. Please install Ananconda for Python 3.5 for the workshop. That has everything we need for the workshop. For attendees more curious, we will be using Jupyter Notebook as our IDE. We will be introducing numpy, scipy, seaborn, matplotlib, plotnine, statsmodel and scikit-learn.

The working repo for this workshop is at https://github.com/amitkaps/hackermath/


Authors:

Amit Kapoor

Bargava Subramanian

Owner
Amit Kapoor
Crafting Visual Stories with Data.
Amit Kapoor
Simple tools for logging and visualizing, loading and training

TNT TNT is a library providing powerful dataloading, logging and visualization utilities for Python. It is closely integrated with PyTorch and is desi

1.5k Jan 02, 2023
An end-to-end image translation model with weight-map for color constancy

CCUnet An end-to-end image translation model with weight-map for color constancy 1. Download the dataset (take Colorchecker_recommended dataset as an

Jianhui Qiu 1 Dec 21, 2021
A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

Yutian Liu 2 Jan 29, 2022
Galaxy images labelled by morphology (shape). Aimed at ML development and teaching

Galaxy images labelled by morphology (shape). Aimed at ML debugging and teaching.

Mike Walmsley 14 Nov 28, 2022
Ensemble Visual-Inertial Odometry (EnVIO)

Ensemble Visual-Inertial Odometry (EnVIO) Authors : Jae Hyung Jung, Yeongkwon Choe, and Chan Gook Park 1. Overview This is a ROS package of Ensemble V

Jae Hyung Jung 95 Jan 03, 2023
Gesture recognition on Event Data

Event based Gesture Recognition Gesture recognition on Event Data usually involv

2 Feb 14, 2022
Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving

SalsaNext: Fast, Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving Abstract In this paper, we introduce SalsaNext f

308 Jan 04, 2023
A simple but complete full-attention transformer with a set of promising experimental features from various papers

x-transformers A concise but fully-featured transformer, complete with a set of promising experimental features from various papers. Install $ pip ins

Phil Wang 2.3k Jan 03, 2023
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

Robotic AI & Learning Lab Berkeley 997 Dec 30, 2022
Simple Tensorflow implementation of Toward Spatially Unbiased Generative Models (ICCV 2021)

Spatial unbiased GANs — Simple TensorFlow Implementation [Paper] : Toward Spatially Unbiased Generative Models (ICCV 2021) Abstract Recent image gener

Junho Kim 16 Apr 15, 2022
Social Fabric: Tubelet Compositions for Video Relation Detection

Social-Fabric Social Fabric: Tubelet Compositions for Video Relation Detection This repository contains the code and results for the following paper:

Shuo Chen 7 Aug 09, 2022
Stitch it in Time: GAN-Based Facial Editing of Real Videos

STIT - Stitch it in Time [Project Page] Stitch it in Time: GAN-Based Facial Edit

1.1k Jan 04, 2023
Implementation of the state-of-the-art vision transformers with tensorflow

ViT Tensorflow This repository contains the tensorflow implementation of the state-of-the-art vision transformers (a category of computer vision model

Mohammadmahdi NouriBorji 2 Mar 16, 2022
This repo is to be freely used by ML devs to check the GAN performances without coding from scratch.

GANs for Fun Created because I can! GOAL The goal of this repo is to be freely used by ML devs to check the GAN performances without coding from scrat

Sagnik Roy 13 Jan 26, 2022
Controlling Hill Climb Racing with Hand Tacking

Controlling Hill Climb Racing with Hand Tacking Opened Palm for Gas Closed Palm for Brake

Rohit Ingole 3 Jan 18, 2022
Official repo for QHack—the quantum machine learning hackathon

Note: This repository has been frozen while we consider the submissions for the QHack Open Hackathon. We hope you enjoyed the event! Welcome to QHack,

Xanadu 118 Jan 05, 2023
Open source annotation tool for machine learning practitioners.

doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ

7.1k Jan 01, 2023
PointPillars inference with TensorRT

A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.

NVIDIA AI IOT 315 Dec 31, 2022
Interactive Visualization to empower domain experts to align ML model behaviors with their knowledge.

An interactive visualization system designed to helps domain experts responsibly edit Generalized Additive Models (GAMs). For more information, check

InterpretML 83 Jan 04, 2023
Galactic and gravitational dynamics in Python

Gala is a Python package for Galactic and gravitational dynamics. Documentation The documentation for Gala is hosted on Read the docs. Installation an

Adrian Price-Whelan 101 Dec 22, 2022