Project: Netflix Data Analysis and Visualization with Python

Overview

Project: Netflix Data Analysis and Visualization with Python

MyNetflixDashboard

Table of Contents

  1. General Info
  2. Installation
  3. Demo
  4. Usage and Main Functionalities
  5. Contributing

General Info

This is a compact Data Visualization project I worked on for fun and to deepen my knowledge about visualizations and graphs using python libraries.

From conception and design to every line of code, the entire Dashboard was worked on by myself. During this project, I was able to repeat and deepen what I had previously learned in my Data Science course of study. Especially, I was able to familiarize myself with pandas and work on my data visualization skills, which I greatly enjoied!

The dataset I used for the Netflix data analytics task consists of my personal Netflix data, which I requested through their website. You can get access to your own data through this link. Feel free to download it and use my code to look into your own viewing behaviour :)

Installation

Requirements: Make sure you have Python 3.7+ installed on your computer. You can download the latest version of Python here.

Req. Packages:

  • pandas
  • dash
  • dash_bootstrap_components
  • ploty.express
  • plotly.graph_objects

Demo

Demo_MyNetflixDashboard_komprimiert.mov

Usage and Main Functionalities

Want to know more about your own Netflix behaviour? For test usage you can download your own Netflix data. Just follow this link and Netflix will send you your personal data.

Please also refer to the comments within the code itself to get more information on the functionalities of the program.


0. Preparing the data for analysis

  • This part cleans up the original data and prepares it for analysis.
  • In the process, columns that are not needed are dropped.
  • Time data is converted into appropriate time formats and split into several columns. The days of the week are added.
  • In addition, the titles of the movies/series are split (title, season number, episode name).

1. Analysis

  • This part of the code is about analyzing the data.
  • We find out how many movies or series were watched over the entire period. We also count the total number of hours Netflix was watched.
  • A pie chart is created that shows which days of the week are watched.
  • In addition, the top 10 series that were watched the longest (in terms of total duration) are displayed.
  • A line chart shows Netflix viewing behavior over the years, counting the total number of hours Netflix was watched.

NetflixOverTime

2. Dash App Layout

  • plotly's Dash is now used to create an Interactive Dashboard of Netflix data.
  • The individual graphics and texts are arranged in rows and containers.
  • This part also includes a dropdown menu that the user can interact with.

3. App Callback

  • Here we connect an interactive bar chart to the Dash Components.
  • The chart represents our total annual hours of Netflix watched, grouped by month. The chart is filterable by year.

MonthlyViews

Contributing

Your comments, suggestions, and contributions are welcome. Please feel free to contribute pull requests or create issues for bugs and feature requests.

Owner
Kathrin Hälbich
Data Science Student and PR- & Marketing-Expert
Kathrin Hälbich
Pyspark project that able to do joins on the spark data frames.

SPARK JOINS This project is to perform inner, all outer joins and semi joins. create_df.py: load_data.py : helps to put data into Spark data frames. d

Joshua 1 Dec 14, 2021
MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

Isabela Caetano 1 Dec 09, 2021
Get mutations in cluster by querying from LAPIS API

Cluster Mutation Script Get mutations appearing within user-defined clusters. Usage Clusters are defined in the clusters dict in main.py: clusters = {

neherlab 1 Oct 22, 2021
Hg002-qc-snakemake - HG002 QC Snakemake

HG002 QC Snakemake To Run Resources and data specified within snakefile (hg002QC

Juniper A. Lake 2 Feb 16, 2022
Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

Fabio Barbazza 1 Feb 15, 2022
The micro-framework to create dataframes from functions.

The micro-framework to create dataframes from functions.

Stitch Fix Technology 762 Jan 07, 2023
Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L

xraypy 95 Dec 13, 2022
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022
Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

Mohammed Hassan 13 Mar 31, 2022
Python package to transfer data in a fast, reliable, and packetized form.

pySerialTransfer Python package to transfer data in a fast, reliable, and packetized form.

PB2 101 Dec 07, 2022
PipeChain is a utility library for creating functional pipelines.

PipeChain Motivation PipeChain is a utility library for creating functional pipelines. Let's start with a motivating example. We have a list of Austra

Michael Milton 2 Aug 07, 2022
Data Analysis for First Year Laboratory at Imperial College, London.

Data Analysis for First Year Laboratory at Imperial College, London. For personal reference only, and to reference in lab reports and lab books.

Martin He 0 Aug 29, 2022
Data and code accompanying the paper Politics and Virality in the Time of Twitter

Politics and Virality in the Time of Twitter Data and code accompanying the paper Politics and Virality in the Time of Twitter. In specific: the code

Cardiff NLP 3 Jul 02, 2022
Senator Trades Monitor

Senator Trades Monitor This monitor will grab the most recent trades by senators and send them as a webhook to discord. Installation To use the monito

Yousaf Cheema 5 Jun 11, 2022
Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms.

eo-grow Earth observation framework for scaled-up processing in Python. Analyzing Earth Observation (EO) data is complex and solutions often require c

Sentinel Hub 18 Dec 23, 2022
INFO-H515 - Big Data Scalable Analytics

INFO-H515 - Big Data Scalable Analytics Jacopo De Stefani, Giovanni Buroni, Théo Verhelst and Gianluca Bontempi - Machine Learning Group Exercise clas

Yann-Aël Le Borgne 58 Dec 11, 2022
Efficient matrix representations for working with tabular data

Efficient matrix representations for working with tabular data

QuantCo 70 Dec 14, 2022
VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

André Rodrigues 2 Feb 14, 2022
Very basic but functional Kakuro solver written in Python.

kakuro.py Very basic but functional Kakuro solver written in Python. It uses a reduction to exact set cover and Ali Assaf's elegant implementation of

Louis Abraham 4 Jan 15, 2022
peptides.py is a pure-Python package to compute common descriptors for protein sequences

peptides.py Physicochemical properties and indices for amino-acid sequences. 🗺️ Overview peptides.py is a pure-Python package to compute common descr

Martin Larralde 32 Dec 31, 2022