Python for Data Analysis, 2nd Edition

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media

Follow Wes on Twitter:

1st Edition Readers

If you are reading the 1st Edition (published in 2012), please find the reorganized book materials on the 1st-edition branch.

Translations

Chinese by Xu Liang
Polish by Michal Biesiada

IPython Notebooks:

Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
Chapter 3: Built-in Data Structures, Functions, and Files
Chapter 4: NumPy Basics: Arrays and Vectorized Computation
Chapter 5: Getting Started with pandas
Chapter 6: Data Loading, Storage, and File Formats
Chapter 7: Data Cleaning and Preparation
Chapter 8: Data Wrangling: Join, Combine, and Reshape
Chapter 9: Plotting and Visualization
Chapter 10: Data Aggregation and Group Operations
Chapter 11: Time Series
Chapter 12: Advanced pandas
Chapter 13: Introduction to Modeling Libraries in Python
Chapter 14: Data Analysis Examples
Appendix A: Advanced NumPy

License

Code

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

Python for Data Analysis, 2nd Edition

Related tags

Overview

Python for Data Analysis, 2nd Edition

1st Edition Readers

Translations

IPython Notebooks:

License

Code

Owner

Wes McKinney

A model checker for verifying properties in epistemic models

Streamz helps you build pipelines to manage continuous streams of data

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

Mining the Stack Overflow Developer Survey

Analytical view of olist e-commerce in Brazil

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

An Aspiring Drop-In Replacement for NumPy at Scale

Maximum Covariance Analysis in Python

This is an analysis and prediction project for house prices in King County, USA based on certain features of the house

Cleaning and analysing aggregated UK political polling data.

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Binance Kline Data With Python

Universal data analysis tools for atmospheric sciences

PyClustering is a Python, C++ data mining library.

Flexible HDF5 saving/loading and other data science tools from the University of Chicago

Pandas and Dask test helper methods with beautiful error messages.

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

Python Implementation of Scalable In-Memory Updatable Bitmap Indexing