This program scrapes information and images for movies and TV shows.

Last update: Dec 05, 2021

Related tags

Overview

Media-WebScraper

This program scrapes information and images for movies and TV shows.

Summary

For more information on the program, read the WebScrape_help text file (this can also be accessed while running the program).

For a given list of media, the program will scrape and save general information, images and any episode information for each media.

General Information (default):

Saved as a .txt file

This will scrape general information:

Title
Release date
Runtime
Genre
Director
Cast
Plot description

Additional information saved:

Source database used for scrape
ID for media in source database
Poster image link

Images (default):

Saved as a .jpg file

This will scrape the poster.

Episode Information (if specified):

Saved as a .csv file

This will scrape information for each episode for a TV show:

Season number
Episode number
Episode title
Episode air date
Episode description

Features:

Multithreaded scraping for media in list to greatly improve the time taken when scraping for large media lists.
Can generate a media list from folders and files in a specified directory or from user input.
Can specify save location for scraped data.
Can specify search tags for media list for a more accurate scrape.
Can choose to scrape all episode information for a TV show.
Can detect if data is already scraped which allows for scraping new media from an already scraped list of media very efficient.
Can recover missing scraped files if one or more are missing without rescraping all data.
Can retry the scrape before exiting the program if there were any incomplete scrapes (successfully scraped files will not be altered or rescraped).
Currently only supports scraping data from IMDb.

Usage:

For more information on the program, read the WebScrape_help text file (this can also be accessed while running the program).

Currently a terminal-based program.

Running the program using python:

Requirements: Python 3.2+ (additional libraries: requests, beautifulsoup4)

Running the program from bundled executable file (created using pyinstaller):

Requirements: Windows 10
Creates a 'temp' folder containing extracted libraries and support files in the same location as the program while running.
- The temporary files will delete automatically but if the program is closed abruptly, the files will remain.
- The 'temp' folder can be manually deleted after closing the program.
- (As of pyinstaller v4.7, a one-file bundled executable will leave any temp '_MEIxxxxxx' folders if the program is force closed)

Updates:

For information on version history, read the HISTORY markdown file.

Scrapes proxies and saves them to a text file

Proxy Scraper Scrapes proxies from https://proxyscrape.com and saves them to a file. Also has a customizable theme system Made by nell and Lamp

2 Dec 22, 2021

Meme-videos - Scrapes memes and turn them into a video compilations

Meme Videos Scrapes memes from reddit using praw and request and then converts t

12 Oct 28, 2022

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

1 Feb 10, 2022

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

WebScraping Web scraping Pyton program that scrapes Job website for python devel

2 Jul 22, 2022

:arrow_double_down: Dumb downloader that scrapes the web

You-Get NOTICE: Read this if you are looking for the conventional "Issues" tab. You-Get is a tiny command-line utility to download media contents (vid

46.4k Jan 3, 2023

Anonymously scrapes onlinesim.ru for new usable phone numbers.

phone-scraper Anonymously scrapes onlinesim.ru for new usable phone numbers. Usage Clone the repository $ git clone https://github.com/thomasgruebl/ph

16 Oct 8, 2022

A Python package that scrapes Google News article data while remaining undetected by Google.

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)

6 Aug 10, 2022

Scrapes Every Email Address of Every Society in Every University

society-email-scrape Site Live at https://kcsoc.github.io/society-email-scrape/ How to automatically generate new data Go to unis.yml Add your uni Cre

18 Dec 14, 2022

Automatically scrapes all menu items from the Taco Bell website

Automatically scrapes all menu items from the Taco Bell website. Returns as PANDAS dataframe.

2 Jan 15, 2022

Releases(v1.3.0)

v1.3.0(Dec 5, 2021)
WebScrape v1.3.0

See version history document for all changes.

Running the program using python:

Download the source code.

Requirements:

Python 3.2+ (additional libraries: requests, beautifulsoup4)

Running the program from bundled executable:

Download the WebScrape-1.3.0 zip file containing the bundled executable (created using pyinstaller).

Requirements:

Windows 10

Note:

The executable file creates a 'temp' folder containing extracted libraries and support files in the same location as the program while running.

The temporary files will delete automatically but if the program is closed abruptly, the files will remain.

The 'temp' folder can be manually deleted after closing the program.

(As of pyinstaller v4.7, a one-file bundled executable will leave any temp '_MEIxxxxxx' folders if the program is force closed)

Source code(tar.gz)
Source code(zip)
WebScrape-1.3.0.zip(8.71 MB)

This program scrapes information and images for movies and TV shows.

Related tags

Overview

Media-WebScraper

Summary

General Information (default):

Images (default):

Episode Information (if specified):

Features:

Usage:

Running the program using python:

Running the program from bundled executable file (created using pyinstaller):

Updates:

You might also like...

Scrapes proxies and saves them to a text file

Meme-videos - Scrapes memes and turn them into a video compilations

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

:arrow_double_down: Dumb downloader that scrapes the web

Anonymously scrapes onlinesim.ru for new usable phone numbers.

A Python package that scrapes Google News article data while remaining undetected by Google.

Scrapes Every Email Address of Every Society in Every University

Automatically scrapes all menu items from the Taco Bell website

Releases(v1.3.0)

v1.3.0(Dec 5, 2021)

WebScrape v1.3.0

Running the program using python:

Requirements:

Running the program from bundled executable:

Requirements:

Note:

Owner

让中国用户使用git从github下载的速度提高1000倍!

Linkedin webscraping - Linkedin web scraping with python

Script used to download data for stocks.

A tool for scraping and organizing data from NewsBank API searches

Auto Join: A GitHub action script to automatically invite everyone to the organization who star your repository.

Telegram group scraper tool

A tool can scrape product in aliexpress: Title, Price, and URL Product.

抢京东茅台脚本，定时自动触发，自动预约，自动停止

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

Scrapes the Sun Life of Canada Philippines web site for historical prices of their investment funds and then saves them as CSV files.

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Python script who crawl first shodan page and check DBLTEK vulnerability

CRI Scrape is a tool for get general info about Italian Red Cross in GAIA Platform

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸 每日一句 + 毒鸡汤（从2月份稳定运行至今）

High available distributed ip proxy pool, powerd by Scrapy and Redis

A python tool to scrape NFT's off of OpenSea

Scraping web pages to get data

for those who dont want to pay $10/month for high school game footage with ads

Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

A distributed crawler for weibo, building with celery and requests.

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）