A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Last update: Mar 28, 2022

Overview

New to Streaming Scraper

An in-progress web scraping project built with Python, R, and SQL.

The scraped data are movie and TV show information. The goal of the project is to show new to streaming titles that arrive on Netflix monthly with additional details, such as critic and audience ratings.

Current stage: Preparing how to present data with R Markdown.

Testing at: https://charlesdungy.github.io/new-to-streaming-scraper/

Future stage: Complete documentation, comments.

Description

Data are retrieved from two different data sources: What's on Netflix (WON) and Rotten Tomatoes (RT). RT data are cleaned and transformed with Python, while WON data are cleaned and transformed with R.

All data are piped into a MySQL database, then retrieved for presentation in R.

Here is a high-level look at the pipeline:

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

MySQL

Current Directory Tree

License

MIT

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Related tags

Overview

New to Streaming Scraper

Description

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

Current Directory Tree

License

Owner

Charles Dungy

Pelican plugin that adds site search capability

A Telegram crawler to search groups and channels automatically and collect any type of data from them.

SkyScrapers: A collection of variety of Scraping Apps

Find papers by keywords and venues. Then download it automatically

Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX)

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

A python module to parse the Open Graph Protocol

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Collection of code files to scrap different kinds of websites.

This is a module that I had created along with my friend. It's a basic web scraping module

This is python to scrape overview and reviews of companies from Glassdoor.

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

A simple flask application to scrape gogoanime website.

This program scrapes information and images for movies and TV shows.

京东抢茅台，秒杀成功很多次讨论，天猫抢购，赚钱交流等。

Command line program to download documents from web portals.

A web crawler for recording posts in "sina weibo"

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file