A workshop on data visualization in Python with notebooks and exercises for following along.

Overview

Beyond the Basics: Data Visualization in Python

Binder Nbviewer View slides in browser

The human brain excels at finding patterns in visual representations, which is why data visualizations are essential to any analysis. Done right, they bridge the gap between those analyzing the data and those consuming the analysis. However, learning to create impactful, aesthetically-pleasing visualizations can often be challenging. This session will equip you with the skills to make customized visualizations for your data using Python.

While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood.

Workshop Outline

This is a workshop on data visualization in Python first delivered at ODSC West 2021 and subsequently at ODSC East 2022 and PyCon Italia 2022. It's divided into the following sections:

Section 1: Getting Started With Matplotlib

We will begin by familiarizing ourselves with Matplotlib. Moving beyond the default options, we will explore how to customize various aspects of our visualizations. By the end of this section, you will be able to generate plots using the Matplotlib API directly, as well as customize the plots that libraries like pandas and Seaborn create for you.

Section 2: Moving Beyond Static Visualizations

Static visualizations are limited in how much information they can show. To move beyond these limitations, we can create animated and/or interactive visualizations. Animations make it possible for our visualizations to tell a story through movement of the plot components (e.g., bars, points, lines). Interactivity makes it possible to explore the data visually by hiding and displaying information based on user interest. In this section, we will focus on creating animated visualizations using Matplotlib before moving on to create interactive visualizations in the next section.

Section 3: Building Interactive Visualizations for Data Exploration

When exploring our data, interactive visualizations can provide the most value. Without having to create multiple iterations of the same plot, we can use mouse actions (e.g., click, hover, zoom, etc.) to explore different aspects and subsets of the data. In this section, we will learn how to use a few of the libraries in the HoloViz ecosystem to create interactive visualizations for exploring our data utilizing the Bokeh backend.


Prerequisites

You should have basic knowledge of Python and be comfortable working in Jupyter Notebooks. Check out this notebook for a crash course in Python or work through the official Python tutorial for a more formal introduction. The environment we will use for this workshop comes with JupyterLab, which is pretty intuitive, but be sure to familiarize yourself using notebooks in JupyterLab and additional functionality in JupyterLab. In addition, a basic understanding of pandas will be beneficial, but is not required; reviewing the first section of my pandas workshop will be sufficient.


Setup Instructions

  1. Install Anaconda/Miniconda. Note that you can use this Binder environment instead if you don't want to install anything on your machine.

  2. Fork this repository:

    location of fork button in GitHub

  3. Clone your forked repository:

    location of clone button in GitHub

  4. Create and activate a conda virtual environment (on Windows, these commands should be run in Anaconda Prompt):

    $ cd python-data-viz-workshop
    ~/python-data-viz-workshop$ conda install mamba -n base -c conda-forge
    ~/python-data-viz-workshop$ mamba env create --file environment.yml
    ~/python-data-viz-workshop$ conda activate data_viz_workshop
    (data_viz_workshop) ~/python-data-viz-workshop$
  5. Launch JupyterLab:

    (data_viz_workshop) ~/python-data-viz-workshop$ jupyter lab
  6. Navigate to the 0-check_your_env.ipynb notebook in the notebooks/ folder:

    open 0-check_your_env.ipynb

  7. Run the notebook to confirm everything is set up properly:

    check env


About the Author

Stefanie Molin (@stefmolin) is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of Hands-On Data Analysis with Pandas, which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science. She is currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

Related Content

All examples herein were developed exclusively for this workshop. Hands-On Data Analysis with Pandas contains additional examples and exercises, as does this blog post and this workshop on pandas.

Owner
Stefanie Molin
Developer | Data Scientist | Author of "Hands-On Data Analysis with Pandas" | occasional hacker
Stefanie Molin
Generate visualizations of GitHub user and repository statistics using GitHub Actions.

GitHub Stats Visualization Generate visualizations of GitHub user and repository statistics using GitHub Actions. This project is currently a work-in-

Aditya Thakekar 1 Jan 11, 2022
Boltzmann visualization - Visualize the Boltzmann distribution for simple quantum models of molecular motion

Boltzmann visualization - Visualize the Boltzmann distribution for simple quantum models of molecular motion

1 Jan 22, 2022
Use Perspective to create the chart for the trader’s dashboard

Task Overview | Installation Instructions | Link to Module 3 Introduction Experience Technology at JP Morgan Chase Try out what real work is like in t

Abdulazeez Jimoh 1 Jan 22, 2022
A little word cloud generator in Python

Linux macOS Windows PyPI word_cloud A little word cloud generator in Python. Read more about it on the blog post or the website. The code is tested ag

Andreas Mueller 9.2k Dec 30, 2022
An XLSX spreadsheet renderer for Django REST Framework.

drf-renderer-xlsx provides an XLSX renderer for Django REST Framework. It uses OpenPyXL to create the spreadsheet and returns the data.

The Wharton School 166 Dec 01, 2022
The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining Concept-Oriented Shared Information".

The HIST framework for stock trend forecasting The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining C

Wentao Xu 111 Jan 03, 2023
Geocoding library for Python.

geopy geopy is a Python client for several popular geocoding web services. geopy makes it easy for Python developers to locate the coordinates of addr

geopy 3.8k Jan 02, 2023
A way of looking at COVID-19 data that I haven't seen before.

Visualizing Omicron: COVID-19 Deaths vs. Cases Click here for other countries. Data is from Our World in Data/Johns Hopkins University. About this pro

1 Jan 10, 2022
UNMAINTAINED! Renders beautiful SVG maps in Python.

Kartograph is not maintained anymore As you probably already guessed from the commit history in this repo, Kartograph.py is not maintained, which mean

1k Dec 09, 2022
Python scripts to manage Chia plots and drive space, providing full reports. Also monitors the number of chia coins you have.

Chia Plot, Drive Manager & Coin Monitor (V0.5 - April 20th, 2021) Multi Server Chia Plot and Drive Management Solution Be sure to ⭐ my repo so you can

338 Nov 25, 2022
A streamlit component for bi-directional communication with bokeh plots.

Streamlit Bokeh Events A streamlit component for bi-directional communication with bokeh plots. Its just a workaround till streamlit team releases sup

Ashish Shukla 123 Dec 25, 2022
Here I plotted data for the average test scores across schools and class sizes across school districts.

HW_02 Here I plotted data for the average test scores across schools and class sizes across school districts. Average Test Score by Race This graph re

7 Oct 27, 2021
GitHub English Top Charts

Help you discover excellent English projects and get rid of the interference of other spoken language.

kon9chunkit 529 Jan 02, 2023
3D Vision functions with end-to-end support for deep learning developers, written in Ivy.

Ivy vision focuses predominantly on 3D vision, with functions for camera geometry, image projections, co-ordinate frame transformations, forward warping, inverse warping, optical flow, depth triangul

Ivy 61 Dec 29, 2022
Pglive - Pglive package adds support for thread-safe live plotting to pyqtgraph

Live pyqtgraph plot Pglive package adds support for thread-safe live plotting to

Martin Domaracký 15 Dec 10, 2022
An application that allows you to design and test your own stock trading algorithms in an attempt to beat the market.

StockBot is a Python application for designing and testing your own daily stock trading algorithms. Installation Use the

Ryan Cullen 280 Dec 19, 2022
Mathematical learnings with Lean, for those of us who wish we knew more of both!

Lean for the Inept Mathematician This repository contains source files for a number of articles or posts aimed at explaining bite-sized mathematical c

Julian Berman 8 Feb 14, 2022
Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects

carcassonne_tools Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects NOTE NOTE NOTE The

1 Nov 08, 2021
clock_plot provides a simple way to visualize timeseries data, mapping 24 hours onto the 360 degrees of a polar plot

clock_plot clock_plot provides a simple way to visualize timeseries data mapping 24 hours onto the 360 degrees of a polar plot. For usage, please see

12 Aug 24, 2022
Plot toolbox based on Matplotlib, simple and elegant.

Elegant-Plot Plot toolbox based on Matplotlib, simple and elegant. 绘制效果 绘制过程 数据准备 每种图标类型的目录下有data.csv文件,依据样例数据填入自己的数据。

3 Jul 15, 2022