Crawl the information of a given keyword on Google search engine

Last update: Nov 09, 2022

Related tags

Web Crawling GoogleSpider

Overview

GoogleSpider

Crawl the information of a given keyword on Google search engine

Config

DataBase

Currently, data is stored in mongodb, and the database configuration is in line 15-19 of the setting. py file, which can be modified by yourself.

# MONGODB
MONGO_IP = "localhost"
MONGO_PORT = 27017
MONGO_DB = "Google_spider"
MONGO_USER_NAME = ""
MONGO_USER_PASS = ""

Log

LOG_NAME = os.path.basename(os.getcwd())
LOG_PATH = "log/%s.log" % LOG_NAME  # log path
LOG_LEVEL = "DEBUG"
LOG_COLOR = True  
LOG_IS_WRITE_TO_CONSOLE = True 
LOG_IS_WRITE_TO_FILE = True  
LOG_MODE = "w" 
LOG_MAX_BYTES = 10 * 1024 * 1024  # Maximum bytes
LOG_BACKUP_COUNT = 20  # Number of log files reserved
LOG_ENCODING = "utf8"  # code
OTHERS_LOG_LEVAL = "ERROR"  # leval

Spider

Download interval
- ```
SPIDER_SLEEP_TIME = [0, 1]
```
Maximum number of requests (100 by default)
- ```
SPIDER_MAX_RETRY_TIMES = 100
```
  Note
  
  If an illegal interface is encountered during crawling, an exception of 'user agent -- illegal interface' will be thrown, and then the crawler task will retry until the data is successfully crawled or more than 100 times

data structure

key	value type	example
title	str	“Donald Trump - Wikipedia”
keyword	str	“Trump"
url	str	"https://en.wikipedia.org/wiki/Donald_Trump"
text	str	Donald Trump - Wikipedia 1 hour ago · Donald John Trump (born June 14, 1946) is an American politician, media personality, and businessman who served as the 45th president of the United States ... Vice President: Mike Pence In office January 20, 2017 – January 20, 2021: In office; January 20, 2017 – January 20, 2021 Occupation: Politician; businessman; television presenter Parents: Fred Trump; Mary Anne MacLeod"

Quick start

Crawl the 3 page data with the keyword 'Trump'

from spiders.google_curl import GoogleCurl

spider = GoogleCurl('Trump', 3)
spider.start()

The first parameter is the search keyword, and the second parameter is the number of pages crawled

Crawl the information of a given keyword on Google search engine

Related tags

Overview

GoogleSpider

Config

DataBase

Log

Spider

data structure

Quick start

Owner

TarkovScrappy - A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov!

Dex-scrapper - Hobby project for scrapping dex data on VeChain

LSpider 一个为被动扫描器定制的前端爬虫

A tool can scrape product in aliexpress: Title, Price, and URL Product.

A simplistic scraper made to download tons of random screenshots made by people.

Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

The core packages of security analyzer web crawler

Scrapy, a fast high-level web crawling & scraping framework for Python.

原神爬虫抓取原神界面圣遗物信息

Scrapy uses Request and Response objects for crawling web sites.

Binance Smart Chain Contract Scraper + Contract Evaluator

A web scraper for nomadlist.com, made to avoid website restrictions.

A Python web scraper to scrape latest posts from official Coinbase's Blog.

学习强国自动化百分百正确、瞬间答题，分值45分

Explore scraping with BeautifulSoup!

Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

DaProfiler allows you to get emails, social medias, adresses, works and more on your target using web scraping and google dorking techniques

:arrow_double_down: Dumb downloader that scrapes the web

Transistor, a Python web scraping framework for intelligent use cases.

A database scraper created with mechanical soup and sqlite

Crawl the information of a given keyword on Google search engine

Related tags

Overview

GoogleSpider

Config

DataBase

Log

Spider

data structure

Quick start

Owner

TarkovScrappy - A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov!

Dex-scrapper - Hobby project for scrapping dex data on VeChain

LSpider 一个为被动扫描器定制的前端爬虫

A tool can scrape product in aliexpress: Title, Price, and URL Product.

A simplistic scraper made to download tons of random screenshots made by people.

Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

The core packages of security analyzer web crawler

Scrapy, a fast high-level web crawling & scraping framework for Python.

原神爬虫 抓取原神界面圣遗物信息

Scrapy uses Request and Response objects for crawling web sites.

Binance Smart Chain Contract Scraper + Contract Evaluator

A web scraper for nomadlist.com, made to avoid website restrictions.

A Python web scraper to scrape latest posts from official Coinbase's Blog.

学习强国 自动化 百分百正确、瞬间答题，分值45分

Explore scraping with BeautifulSoup!

Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

DaProfiler allows you to get emails, social medias, adresses, works and more on your target using web scraping and google dorking techniques

:arrow_double_down: Dumb downloader that scrapes the web

Transistor, a Python web scraping framework for intelligent use cases.

A database scraper created with mechanical soup and sqlite

原神爬虫抓取原神界面圣遗物信息

学习强国自动化百分百正确、瞬间答题，分值45分