Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Last update: Dec 27, 2022

Overview

crawlersuseragents

This Python script can be used to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Features

30 crawler's user agent strings.
Multithreading.
JSON export with --json outputfile.json.
Auto-detecting responses that stands out.

Usage

$ ./crawlersuseragents.py -h
[~] Access web pages as web crawlers User-Agents, v1.1

usage: crawlersuseragents.py [-h] [-v] [-t THREADS] [-x PROXY] [-k] [-L] [-j JSONFILE] url

This Python script can be used to check if there is any differences in responses of an application
when the request comes from a search engine's crawler.

positional arguments:
  url                   e.g. https://example.com:port/path

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         arg1 help message
  -t THREADS, --threads THREADS
                        Number of threads (default: 5)
  -x PROXY, --proxy PROXY
                        Specify a proxy to use for requests (e.g., http://localhost:8080)
  -k, --insecure        Allow insecure server connections when using SSL (default: False)
  -L, --location        Follow redirects (default: False)
  -j JSONFILE, --jsonfile JSONFILE
                        Save results to specified JSON file.

Auto-detecting responses that stands out

Results are sorted by uniqueness of their response's length. This means that the results with unique response length will be on top, and results with response's length occurring multiple times at the bottom:

Two different result lengths	Four different result lengths

Contributing

Pull requests are welcome. Feel free to open an issue if you want to add other features.

References

You might also like...

Audio media crawler for lbry.

Audio media crawler for lbry. Requirements Python 3.8 Poetry 1.1.7 Elasticsearch 7.14.0 Lbry-sdk 0.99.0 Development This project uses poetry as a depe

4 Dec 3, 2022

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Toxicity comments crawler Crawler job that scrapes comments from social media posts and saves them in a S3 bucket. Twitter Tweets and replies are scra

2 Jan 24, 2022

A crawler of doubamovie

豆瓣电影 A crawler of doubamovie 一个小小的入门级scrapy框架的应用，选取豆瓣电影对排行榜前1000的电影数据进行爬取。 spider.py start_requests方法为scrapy的方法，我们对它进行重写。 def start_requests(self):

1 Oct 5, 2021

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

3 Oct 4, 2022

A Pixiv web crawler module

Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co

1 Nov 14, 2021

Google Maps crawler using Selenium

Google Maps Crawler using Selenium Built as part of the Antifragile Dev Project Selenium crawler that browses Google Maps as a regular user and stores

46 Dec 16, 2022

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an

1 Dec 30, 2021

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Related tags

Overview

crawlersuseragents

Features

Usage

Auto-detecting responses that stands out

Contributing

References

You might also like...

Audio media crawler for lbry.

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

A crawler of doubamovie

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

A Pixiv web crawler module

Google Maps crawler using Selenium

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

A dead simple crawler to get books information from Douban.

A dead simple crawler to get books information from Douban.

Releases(1.1)

1.1(Nov 15, 2021)

Owner

Podalirius

Binance Smart Chain Contract Scraper + Contract Evaluator

A distributed crawler for weibo, building with celery and requests.

A web Scraper for CSrankings.com that scrapes University and Faculty list for a particular country

Pyrics is a tool to scrape lyrics, get rhymes, generate relevant lyrics with rhymes.

Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository

热搜榜-python爬虫+正则re+beautifulsoup+xpath

High available distributed ip proxy pool, powerd by Scrapy and Redis

京东茅台抢购 2021年4月最新版

Web Scraping OLX with Python and Bsoup.

Python script who crawl first shodan page and check DBLTEK vulnerability

Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

Simply scrape / download all the media from an fansly account.

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

Extract embedded metadata from HTML markup

Scraping web pages to get data

Grab the changelog from releases on Github

An automated, headless YouTube Watcher and Scraper

Crawl BookCorpus

Web and PDF Scraper Refactoring

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil