Statistical Rethinking course winter 2022

Last update: Dec 31, 2022

Related tags

Data Analysis stat_rethinking_2022

Overview

Statistical Rethinking (2022 Edition)

Instructor: Richard McElreath

Lectures: Uploaded <Playlist> and pre-recorded, two per week

Discussion: Online, Fridays 3pm-4pm Central European Time

Purpose

This course teaches data analysis, but it focuses on scientific models first. The unfortunate truth about data is that nothing much can be done with it, until we say what caused it. We will prioritize conceptual, causal models and precise questions about those models. We will use Bayesian data analysis to connect scientific models to evidence. And we will learn powerful computational tools for coping with high-dimension, imperfect data of the kind that biologists and social scientists face.

Format

Online, flipped instruction. The lectures are pre-recorded. We'll meet online once a week for an hour to work through the solutions to the assigned problems.

We'll use the 2nd edition of my book, <Statistical Rethinking>. I'll provide a PDF of the book to enrolled students.

Registration: Please sign up via <[COURSE IS FULL SORRY]>. I've also set aside 100 audit tickets at the same link, for people who want to participate, but who don't need graded work and course credit.

Calendar & Topical Outline

There are 10 weeks of instruction. Links to lecture recordings will appear in this table. Weekly problem sets are assigned on Fridays and due the next Friday, when we discuss the solutions in the weekly online meeting.

Lecture playlist on Youtube: <Statistical Rethinking 2022>

Week ##	Meeting date	Reading	Lectures
Week 01	07 January	Chapters 1, 2 and 3	[1] <The Golem of Prague> <(Slides)> [2] <Bayesian Inference> <(Slides)>
Week 02	14 January	Chapters 4 and 5	[3] <Basic Regression> <(Slides)> [4] <Categories & Curves> <(Slides)>
Week 03	21 January	Chapters 5 and 6	[5] <Elemental Confounds> <(Slides)> [6] <Good & Bad Controls> <(Slides)>
Week 04	28 January	Chapters 7 and 8	[7] Overfitting [8] Interactions
Week 05	04 February	Chapters 9, 10 and 11	[9] Markov chain Monte Carlo [10] Binomial GLMs
Week 06	11 February	Chapters 11 and 12	[11] Poisson GLMs [12] Ordered Categories
Week 07	18 February	Chapter 13	[13] Multilevel Models [14] Multi-Multilevel Models
Week 08	25 February	Chapter 14	[15] Varying Slopes [16] Gaussian Processes
Week 09	04 March	Chapter 15	[17] Measurement Error [18] Missing Data
Week 10	11 March	Chapters 16 and 17	[19] Beyond GLMs: State-space Models, ODEs [20] Horoscopes

Coding

This course involves a lot of scripting. Students can engage with the material using either the original R code examples or one of several conversions to other computing environments. The conversions are not always exact, but they are rather complete. Each option is listed below.

Original R Flavor

For those who want to use the original R code examples in the print book, you need to install the rethinking R package. The code is all on github https://github.com/rmcelreath/rethinking/ and there are additional details about the package there, including information about using the more-up-to-date cmdstanr instead of rstan as the underlying MCMC engine.

R + Tidyverse + ggplot2 + brms

The <Tidyverse/brms> conversion is very high quality and complete through Chapter 14.

Python and PyMC3

The <Python/PyMC3> conversion is quite complete.

Julia and Turing

The <Julia/Turing> conversion is not as complete, but is growing fast and presents the Rethinking examples in multiple Julia engines, including the great <TuringLang>.

Other

The are several other conversions. See the full list at https://xcelab.net/rm/statistical-rethinking/.

Homework and solutions

I will also post problem sets and solutions. Check the folders at the top of the repository.

Statistical Rethinking course winter 2022

Related tags

Overview

Statistical Rethinking (2022 Edition)

Purpose

Format

Calendar & Topical Outline

Coding

Original R Flavor

R + Tidyverse + ggplot2 + brms

Python and PyMC3

Julia and Turing

Other

Homework and solutions

Owner

Richard McElreath

Data Analysis for First Year Laboratory at Imperial College, London.

Hydrogen (or other pure gas phase species) depressurization calculations

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

Active Learning demo using two small datasets

Clean and reusable data-sciency notebooks.

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

AWS Glue ETL Code Samples

Automated Exploration Data Analysis on a financial dataset

Bigdata Simulation Library Of Dream By Sandman Books

Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.

A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

Python Kalman filtering and optimal estimation library. Implements Kalman filter, particle filter, Extended Kalman filter, Unscented Kalman filter, g-h (alpha-beta), least squares, H Infinity, smoothers, and more. Has companion book 'Kalman and Bayesian Filters in Python'.

For making Tagtog annotation into csv dataset

The micro-framework to create dataframes from functions.

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Improving your data science workflows with

OpenDrift is a software for modeling the trajectories and fate of objects or substances drifting in the ocean, or even in the atmosphere.

Business Intelligence (BI) in Python, OLAP

Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.