Web Downloader With Python

Overview

Web Downloader

Introduction

This module will provide API to download the webpage components : html file, image file, css fil, javascript file, href link file based on the input url (the url must start with 'http' or 'https' ).

To prosses multiple URLs at the same time, The user can list all the url he wants to download in the file "urllist.txt" as shown below:

# Add the URL you want to download line by line(The url must start with 'http' or 'https' ):
# example: https://www.google.com
https://www.google.com
https://www.carousell.sg/
https://www.google.com/search?q=github&sxsrf=AOaemvJh3t5_h8H85AE8Ajbb1IMnBrRISA%3A1636698503535&source=hp&ei=hwmOYY6mHdGkqtsPq8S9sAY&iflsig=ALs-wAMAAAAAYY4Xl7GLWS16_xc2Q9XrG0p3q277DpkL&oq=&gs_lcp=Cgdnd3Mtd2l6EAEYADIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzINCC4QxwEQowIQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJ1AAWABgjgdoAXAAeACAAQCIAQCSAQCYAQCwAQo&sclient=gws-wiz
https://stackoverflow.com/questions/66022042/how-to-let-kubernetes-pod-run-a-local-script/66025424

Program Setup

Development Environment : python 3.7.4
Additional Lib/Software Need
  1. beautifulsoup4 4.10.0

    install:

    pip install beautifulsoup4
    

    Lib link: https://pypi.org/project/beautifulsoup4/

Hardware Needed : None
Program File List

version: v0.1

Program File Execution Env Description
webDownload.py python 3 Main executable program use the API.
urllist.txt url record list.

Program Usage

Module API Usage
  1. Downloader init:
soup = urlDownloader(imgFlg=True, linkFlg=True, scriptFlg=True)
  • imgFlg: Set to "True" to download all the "" tag files.
  • linkFlg: Set to "True" to download all the html section, image, icon, css file imported by ""
  • scriptFlg: set to "True" to download all the js file.
  1. Call API method savePage to scape url and save the data in a folder

    soup.savePage('
         
          ', '
          
           ')
    
    # Exampe:
    soup.savePage('https://www.google.com', 'www_google_com')
    
          
         
Program Execution
  1. Copy the url you want to check in the url record file "urllist.txt"

  2. Cd to the program folder and run program execution cmd:

    python webDownload.py
    
  3. Check the result:

    For example, if you copy the url "https://www.carousell.sg/" as the first url you want to check into the file "urllist.txt" file, all the html files, image file and js files will be under folder "1_www.carousell.sg_files"

    • The main web page will be saved as: "1_www.carousell.sg_files/1_www.carousell.sg.html"
    • The image used in the page will be saved in folder: "1_www.carousell.sg_files/img"
    • The html/imge/css import by href will be saved in folder: "1_www.carousell.sg_files/link"
    • The js file used by the page will be saved in fodler: "1_www.carousell.sg_files/script"

Problem and Solution

Problem[0]: Files download got slight different

Why there is a slight different between the files which download by using the program and the files which downlaod I use some-webBrowser's "page save as " for the same URL such as www.google.com

OS Platform : n.a

Error Message: n.a

Type: n.a

Solution:

This is normal situation, the logic of web scrape and browser display are different: if you type www.google.ccom if different people's browser, you can see the page shown on different browser are also different. This is because the browser cache, token in the local storage , cookie will make influence of the "GET" request. So when different people type in the google URL in their browser, they can see their own Gmail Icon shows on the right top corner. If you remove all the cache, token in the local storage , cookie of your browser and try "page save as ", the file downloaded by "page save as " should be same as the program.

Problem[2]: Some download Image are empty

OS Platform : n.a

Error Message: n.a

Type: n.a

Solution:

If a web use third party's storage to save the image and the net-storage need to authorization before download, our program download request will be reject and got 'null' when download the file. Then the saved image will be empty.


Last edit by LiuYuancheng([email protected]) at 13/11/2021

Tool To download 4KHDR DV SDR from AppleTV

# APPLE-TV 4K Downloader Tool To download 4K HDR DV SDR from AppleTV Hello Fellow Developers/ ! Hi! My name is WVDUMP. I am Leaking the scripts to

5 Dec 25, 2021
Download Web-10K data by querying Bing Image Search

gpv2-web10k This repository contains the script to download images from the Web-10K dataset. The script takes in a list of queries, queries Bing Image

AI2 8 Sep 06, 2022
A Celery application to collect data, download media and extract information from social media APIs

Project IBEX A Celery application to collect data, download media and extract information from social media APIs. Requirements You must have a Redis D

ibex 4 Dec 15, 2022
Download all posts and comments in a subreddit

subreddit downloader This subreddit downloader downloads all posts and comments in a subreddit For a tutorial to use this program please follow this m

Guneet 6 Dec 16, 2022
New York Times Front Page Downloader.

TIMETRAVELER New York Times Front Page Downloader. Usage python3 timetraveler.py All data will be saved at ~/timetraveler/ Goals To keep a historica

Daeshon Jones 0 Oct 31, 2021
A discord bot for downloading youtube video and audio files

disctube disctube is a discord bot for downloading video and audio files from youtube using python pytube. disclaimer i am not the best python program

razor420 3 Feb 03, 2022
Twitter Media Downloader (Telegram Bot)

Twitter Media Downloader (Telegram Bot)

Matin Baloochestani 8 Oct 27, 2022
The PornHub Downloader is a powerfull script used to download and manage both videos and pictures

The PornHub Downloader is a powerfull script used to download and manage both videos and pictures

16 Aug 31, 2022
Utility for downloading works from AO3 (Archive Of Our Own)

froyo A small graphical application for batch downloading works from Archive Of Our Own (AO3). Curate a fic repo of your own today :) Features Batch d

flux 24 Dec 09, 2022
A user-friendly GUI for the ZSpotify music downloader.

ZSpotifyGUI A user-friendly desktop app for ZSpotify music downloader for Windows, MacOs, and Linux Discord Server - Matrix Server - Gitea Mirror - Ma

94 Dec 17, 2022
Download all posts and comments in a subreddit

subreddit downloader This subreddit downloader downloads all posts and comments in a subreddit For a tutorial to use this program please follow this m

Guneet 6 Dec 16, 2022
A Udemy downloader that can download DRM protected videos and non-DRM protected videos.

Udemy Downloader with DRM support NOTE This program is WIP, the code is provided as-is and i am not held resposible for any legal repercussions result

Puyodead1 468 Dec 29, 2022
一个在新番更新后第一时间在dmhy等BT下载站自动下载的小工具.

Anime Track 一个在新番更新后第一时间自动下载的小工具. 可以根据自定义的关键字在dmhy等BT下载站在搜索结果更新时将磁力链发送至aria2实现自动下载. 基本功能包含: 将BT下载站的某个关键字的搜索结果的所有磁力链添加至ARIA2 自动更新aria2 trackers 将已添加的磁力

Sunky 24 Oct 12, 2022
Web Downloader With Python

Web Downloader Introduction This module will provide API to download the webpage components : html file, image file, css fil, javascript file, href li

3 Dec 28, 2022
YouTube Video publisher using youtube-dl & ROS2🐢

YouTube-publisher-ROS2 Publish sensor_msgs/Image by "YouTube" 🤗 🤗 🤗 ! You don't have to use webcamera or your video to check demos. Purpose Quick d

Ar-Ray 5 Dec 04, 2022
A program that can download animations from myself website

MYD A program that can download animations from myself website 一個可以用來下載Myself網站上動漫的程式 Quick Start [無GUI版本] 確定電腦內包含 ffmpeg 並設為環境變數 (Environment Variabl

Patrick_star 1 Nov 07, 2021
Make YouTube videos tasks in Todoist faster and time efficient!

Youtubist Basically fork of yt-dlp python module to my needs. You can paste playlist or channel link on the YouTube. It will automatically format to s

Konrad Konieczny 1 Dec 04, 2022
Youtube video downloader and info extractor for python.

tube_dl Tube_dl is a Simple Youtube video downloader for Python. A Modular approach to bypass and download Youtube Videos and Playlist from Youtube us

Shekhar Chander 16 Jul 09, 2022
Python utility to download jobs at seek.com.au

Job Seeker job_seeker is an utility to download data of a job search from seek.com.au into a csv file for data analysis and exploration Install using

PyBites 3 May 14, 2022