Web-Scraper-for-a-news-website

This is a webscraper for a specific website (Economic Times). It is tuned to extract the headlines of that website. With some little adjustments the webscraper is able to extract any part of the website.

Installation

Install the following:

Selenium: Please follow the link https://selenium-python.readthedocs.io/installation.html and install the selenium.
Chromedriver: Check your Chrome browser's version (Menu -> Help -> About Google Chrome) and download the relevant Chromedriver from https://sites.google.com/chromium.org/driver/home
TQDM: https://pypi.org/project/tqdm/
BeautifulSoup4: https://pypi.org/project/beautifulsoup4/

Using the webscraper

It is important to take care of the sequence of executing these files. Please follow the sequence below:

ET_Archive_Links.py: Use this website as it is the source of everything that we'll do later. This scripy gives us the initial links in the Archive page of the website.
ET_All_Links_Inside_Archive.py: This is the script that takes the output (csv file) of the previous script. It produces a new file which contain URLs of all the archived news on the website since 2002.
ET_Content.py: Finally, this is the script that scrapes the headlines along with the dates. ( If you want to scrap any other part of the website then this is the script that you have to edit )

Dataset

I used the scraper on another news website named "Businessline". It's dataset is available on Kaggle(https://www.kaggle.com/rsiyanwal/20182019-businessline-headlines).

This is a webscraper for a specific website

Related tags

Overview

Web-Scraper-for-a-news-website

Installation

Using the webscraper

Dataset

Owner

Rahul Siyanwal

PS5 bot to find a console in france for chrismas 🎄🎅🏻 NOT FOR SCALPERS

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

Web Scraping COVID 19 Meta Portal with Python

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post

🐞 Douban Movie / Douban Book Scarpy

Basic-html-scraper - A complete how to of web scraping with Python for beginners

Library to scrape and clean web pages to create massive datasets.

A distributed crawler for weibo, building with celery and requests.

A simple Discord scraper for discord bots

VG-Scraper is a python program using the module called BeautifulSoup which allows anyone to scrape something off an website. This program lets you put in a number trough an input and a number is 1 news article.

Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs.

Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

Scrape all the media from an OnlyFans account - Updated regularly

AssistScraper - program for /r/nba to use to find list of all players a player assisted and how many assists each player recieved

An experiment to deploy a serverless infrastructure for a scrapy project.

Web scrapping

Kusonime scraper using python3

Meme-videos - Scrapes memes and turn them into a video compilations