A Web Scraping Program.

Overview

Web Scraping

AUTHOR: Saurabh G.
MTech Information Security, IIT Jammu.


If you find this repository useful.
I would appreciate if you Star it and Fork it !


This project is a part of Lab Tutorial for Data Organization and Retrieval course.
This tutorial is to be followed by MTech Data Science students of IIT Jammu, Batch 2021.


Objective
The objective of this tutorial is to help the students understand the basics of web scraping.


HOW TO RUN THIS PROJECT

Import the project in Pycharm IDE and run the "main.py" file. Use the "Add interpreter" of pycharm and set the path to "venv" folder provided in this repository.

The project will run !


Slides used for this lab can be found in the link below

Link to slides


Suggested Tutorial for Prerequisite

  1. Python: https://www.youtube.com/watch?v=_uQrJ0TkZlc
  2. Python file Handling: https://www.w3schools.com/python/python_file_handling.asp
  3. Beautiful Soup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Suggested Articles

  1. Get and Post in python https://www.geeksforgeeks.org/get-post-requests-using-python/
  2. https://2.python-requests.org/en/master/user/advanced/#id2
  3. https://www.nylas.com/blog/use-python-requests-module-rest-apis/
  4. response methods : https://www.geeksforgeeks.org/response-methods-python-requests/
  5. user Agent : https://www.whatismybrowser.com/detect/what-is-my-user-agent
  6. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent
  7. https://www.howtogeek.com/114937/htg-explains-whats-a-browser-user-agent/
  8. https://www.geeksforgeeks.org/python-string-strip/
  9. https://www.geeksforgeeks.org/python-list-index/
Owner
Saurabh G.
Research Interests: Federated Learning, Split Learning, SplitFed, Privacy Preserving AI
Saurabh G.
此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

N0el4kLs 5 Nov 19, 2021
Example of scraping a paginated API endpoint and dumping the data into a DB

Provider API Scraper Example Example of scraping a paginated API endpoint and dumping the data into a DB. Pre-requisits Python = 3.9 Pipenv Setup # i

Alex Skobelev 1 Oct 20, 2021
a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

This is George's Scraping Project To get started cd into the theZoo file and run: chmod +x script.sh then: ./script.sh This will spin up a Postgres co

George Reyes 7 Nov 27, 2022
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Documentation Documentation

Gerapy 2.9k Jan 03, 2023
A simple reddit scraper to get memes (only images) from r/ProgrammerHumor.

memey A simple reddit scraper to get memes (only images) from r/ProgrammerHumor. Note Only works if you have firefox installed (yet). Instructions foo

2 Nov 16, 2021
A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

Yun Wang 1 Jan 10, 2022
Crawl the information of a given keyword on Google search engine

Crawl the information of a given keyword on Google search engine

4 Nov 09, 2022
Basic-html-scraper - A complete how to of web scraping with Python for beginners

basic-html-scraper Code from YT Video This video includes a complete how to of w

John 12 Oct 22, 2022
Displays market info for the LUNI token on the Terra Blockchain

LuniBot for Discord Displays market info for the LUNI/LUNA token on the Terra Blockchain (Webscrape method currently scraping CoinMarketCap). Will evo

0 Jan 22, 2022
Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

Instagram_scrapper This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or exce

Lakhdar Belkharroubi 5 Oct 17, 2022
Discord webhook spammer with proxy support and proxy scraper

Discord webhook spammer with proxy support and proxy scraper

3 Feb 27, 2022
Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

Guilherme Silva Uchoa 3 Oct 04, 2022
mlscraper: Scrape data from HTML pages automatically with Machine Learning

🤖 Scrape data from HTML websites automatically with Machine Learning

Karl Lorey 798 Dec 29, 2022
A way to scrape sports streams for use with Jellyfin.

Sportyfin Description Stream sports events straight from your Jellyfin server. Sportyfin allows users to scrape for live streamed events and watch str

axelmierczuk 38 Nov 05, 2022
Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

crawlersuseragents This Python script can be used to check if there is any differences in responses of an application when the request comes from a se

Podalirius 13 Dec 27, 2022
This project was created using Python technology and flask tools to scrape a music site

python-scrapping This project was created using Python technology and flask tools to scrape a music site You need to install the following packages to

hosein moradi 1 Dec 07, 2021
Web Scraping COVID 19 Meta Portal with Python

Web-Scraping-COVID-19-Meta-Portal-with-Python - Requests API and Beautiful Soup to scrape real-time COVID statistics from worldometer website and perform data cleaning and visual analysis in Jupyter

Aarif Munwar Jahan 1 Jan 04, 2022
Binance Smart Chain Contract Scraper + Contract Evaluator

Pulls Binance Smart Chain feed of newly-verified contracts every 30 seconds, then checks their contract code for links to socials.Returns only those with socials information included, and then submit

14 Dec 09, 2022
Web Scraping Framework

Grab Framework Documentation Installation $ pip install -U grab See details about installing Grab on different platforms here http://docs.grablib.

2.3k Jan 04, 2023
茅台抢购最新优化版本,茅台秒杀,优化了抢购协程队列

茅台抢购最新优化版本,茅台秒杀,优化了抢购协程队列

MaoTai 33 Sep 03, 2022