a high-performance, lightweight and human friendly serving engine for scrapy

Last update: Mar 01, 2022

Related tags

Overview

scrapy-x (X)

a distributed, scalable and lightweight environment for deploying and running scrapy spiders/projects with no-hassle on commodity hardware, also it is compatible with scrapyd /schedule.json and /daemonstatus.json.

Installation

$ pip install -U git+git://github.com/speakol-ads/scrapy-x.git

Usage

let's assume that you have a project called TestCrawler

cd to TestCrawler
run scrapy x
that is all!

Default Settings

it utilizes your default project settings.py file

# whether to enable debug mode or not
X_DEBUG = True

# the default queue name that the system will use
# actually it will be used as a prefix for its internal
# queues, currently there is only one queue called `X_QUEUE_NAME + '.BACKLOG'`
# which holds all jobs that should be crawled.
X_QUEUE_NAME = 'SCRAPY_X_QUEUE'

# the queue workers
# by default it uses the cpu cores count
# try to adjust it based on your resources & needs
X_QUEUE_WORKERS_COUNT = os.cpu_count()

# the webserver workers count
# the workers count required from uvicorn to spwan
# defaults to the available cpu count
# try to adjust it based on your resources & needs
X_SERVER_WORKERS_COUNT = os.cpu_count()

# the port the http server should listen on
X_SERVER_LISTEN_PORT = 6800

# the host used by the http server to listen on
X_SERVER_LISTEN_HOST = '0.0.0.0'

# whether to enable access log or not
X_ENABLE_ACCESS_LOG = True

# redis host
X_REDIS_HOST = 'localhost'

# redis port
X_REDIS_PORT = 6379

# redis db
X_REDIS_DB = 0

# redis password
X_REDIS_PASSWORD = ''

# the maximum allowed wait time for a running task
# it will be killed after that time.
X_TASK_TIMEOUT = 25

Available Endpoints

as well scrapyd core endpoints like (schedule.json, daemonstatus.json), you have the following too:

GET /

returns some info about the engine like the available spiders and backlog queue length

GET|POST /run/{spider_name}

execute the specified spider in {spider_name} and wait for it to return its result, P.S: any query param and json post data will be passed to the spider as argument -a key=value

GET|POST /enqueue/{spider_name}

adding the specified spider in {spider_name} to the backlog to be executed later, P.S: any query param and json post data will be used as spider argument

Technologies Used

Author

I'm Mohamed, a software engineer who enjoys writing code in his free time, I'm speaking python, php, go, rust and js

My Similar Projects

P.S: star the project if you liked it ^_^

a high-performance, lightweight and human friendly serving engine for scrapy

Related tags

Overview

scrapy-x (X)

Installation

Usage

Default Settings

Available Endpoints

Technologies Used

Author

My Similar Projects

Owner

Speakol Ads

Simple Web scrapper Bot to scrap webpages using Requests, html5lib and Beautifulsoup.

mlscraper: Scrape data from HTML pages automatically with Machine Learning

Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit

A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

Auto Join: A GitHub action script to automatically invite everyone to the organization who star your repository.

Automated Linkedin bot that will improve your visibility and increase your network.

Web scrapping

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Dictionary - Application focused on word search through web scraping

A low-code tool that generates python crawler code based on curl or url

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

WebScrapping Project - G1 Latest News

Script used to download data for stocks.

🤖 Threaded Scraper to get discord servers from disboard.org written in python3

Web3 Pancakeswap Sniper bot written in python3

UsernameScraperTool - Username Scraper Tool With Python

Scrape puzzle scrambles from csTimer.net

Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

This is a python api to scrape search results from a url.

一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件