Screen scraping and web crawling framework

Last update: Jun 21, 2021

Overview

Pomp

Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the hard Twisted dependency.

Features:

Pure python
Only one dependency for Python 2.x - concurrent.futures (backport of package for Python 2.x)
Supports one file applications; Pomps doesn't force a specific project layout or other restrictions.
Pomp is a meta framework like Paste: you may use it to create your own scraping framework.
Extensible networking: you may use any sync or async method.
No parsing libraries in the core; use you preferred approach.
Pomp instances may be distributed and are designed to work with an external queue.

Pomp makes no attempt to accomodate:

redirects
proxies
caching
database integration
cookies
authentication
etc.

If you want proxies, redirects, or similar, you may use the excellent requests library as the Pomp downloader.

Pomp examples

Pomp docs

Pomp is written and maintained by Evgeniy Tatarkin and is licensed under the BSD license.

Screen scraping and web crawling framework

Related tags

Overview

Pomp

Owner

Evgeniy Tatarkin

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

A dead simple crawler to get books information from Douban.

OSTA web scraper, for checking the status of school buses in Ottawa

Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

Web Scraping Instagram photos with Selenium by only using a hashtag.

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

A web scraper that exports your entire WhatsApp chat history.

自动完成每日体温上报（Github Actions）

🥫 The simple, fast, and modern web scraping library

High available distributed ip proxy pool, powerd by Scrapy and Redis

A Very simple free proxy list scraper.

An experiment to deploy a serverless infrastructure for a scrapy project.

A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

Scraping and visualising India's real-time COVID-19 data from the MOHFW dataset.

Parse feeds in Python

Facebook Group Scraping Using Beautiful Soup & Selenium

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

Nekopoi scraper using python3

A Python module to bypass Cloudflare's anti-bot page.

Scraping web pages to get data