Web Scraping OLX with Python and Bsoup.

Last update: Sep 25, 2022

Related tags

Overview

webScrap

WebScraping first step.

Authors: Paulo, Claudio M.

First steps in Web Scraping. Project carried out for training in Web Scrapping. The export of information to a structured database (Pandas DataFrame) where the information was obtained by making a request() call from pages with known addresses. Find the information in the 'lxml' code formatted by BeautfullSoup, and finally exported in csv format.

How to automate the search for related words in OLX ads.
Can I use quartile analysis to find the best product at the best price?

Our Plan

Select the list of related words.
Use requests to download the page.
Use BSsoup to format the downloaded page in lxml.
Create a structured database with date and time of posting, ad title, product value, city and neighborhood where it is being advertised.
Filter the database by removing ads whose ad title does not contain the desired words.
Use the percentile and average value metric to find the average price of advertisements by cities (of Brazilian states).

Current progress

Data scraping was carried out and the database was created to analyze the average value by city.

Database formed by information in OLX Brasil website advertisements.

The code is with variables and comments in Portuguese, and the search for advertisements is carried out with words in the Portuguese language.

Web Scraping OLX with Python and Bsoup.

Related tags

Overview

webScrap

WebScraping first step.

Authors: Paulo, Claudio M.

Our Plan

Current progress

References

Owner

claudio paulo

Danbooru scraper with python

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

Pseudo API for Google Trends

fork huanghyw/jd_seckill

✂️🕷️ Spider-Cut is a Network Mapper Framework (NMAP Framework)

A low-code tool that generates python crawler code based on curl or url

抢京东茅台脚本，定时自动触发，自动预约，自动停止

Crawl BookCorpus

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）

An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.

Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

🕷 Phone Crawler with multi-thread functionality

Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

A package designed to scrape data from Yahoo Finance.

Divar.ir Ads scrapper

An experiment to deploy a serverless infrastructure for a scrapy project.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Web crawling framework based on asyncio.

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

Web Scraping OLX with Python and Bsoup.

Related tags

Overview

webScrap

WebScraping first step.

Authors: Paulo, Claudio M.

Our Plan

Current progress

References

Owner

claudio paulo

Danbooru scraper with python

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

Pseudo API for Google Trends

fork huanghyw/jd_seckill

✂️🕷️ Spider-Cut is a Network Mapper Framework (NMAP Framework)

A low-code tool that generates python crawler code based on curl or url

抢京东茅台脚本，定时自动触发，自动预约，自动停止

Crawl BookCorpus

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸 每日一句 + 毒鸡汤（从2月份稳定运行至今）

An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.

Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

🕷 Phone Crawler with multi-thread functionality

Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

A package designed to scrape data from Yahoo Finance.

Divar.ir Ads scrapper

An experiment to deploy a serverless infrastructure for a scrapy project.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Web crawling framework based on asyncio.

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）