A web crawler for recording posts in "sina weibo"

Last update: Aug 20, 2022

Overview

Web Crawler for "sina weibo"

A web crawler for recording posts in "sina weibo"

Introduction

This script helps collect attributes of posts in "sina weibo". Users can record posts in different lists (or flows, or collections), like the searching results. The supported lists (or flows, or collections) are listed in "Functions" section.

Functions

Scripts currently available:

Name Description

search.py Search for a word and specific time interval and record all posts, the search result.
Parameters: (Edit these parameters at the head of the script.)
search_string: The string to search for. All posts containing this string will be recorded, 50 pages at most.
start_time: Only posts which are posted after this time will be recorded. (Accurate to hour level)
end_time: Only posts which are posted before this time will be recorded. (Accurate to hour level)
rest_time: The interval between two requests, where the unit is second.
Results are saved as Python pickle format at results/weibo-{search_string}-{start_time}-{end_time}.pkl. The start_time and end_time in filename are formatted as Unix timestamp (the unit is second).

Name	Description
`search.py`	Search for a word and specific time interval and record all posts, the search result. Parameters: (Edit these parameters at the head of the script.) `search_string`: The string to search for. All posts containing this string will be recorded, 50 pages at most. `start_time`: Only posts which are posted after this time will be recorded. (Accurate to hour level) `end_time`: Only posts which are posted before this time will be recorded. (Accurate to hour level) `rest_time`: The interval between two requests, where the unit is second. Results are saved as Python pickle format at `results/weibo-{search_string}-{start_time}-{end_time}.pkl`. The `start_time` and `end_time` in filename are formatted as Unix timestamp (the unit is second).

Installation

Run pip install -r requirements.txt.
According to "Function" section, find the script you need.
Edit parameters at the head of the script.
Run the script with Python.

A web crawler for recording posts in "sina weibo"

Related tags

Overview

Web Crawler for "sina weibo"

Introduction

Functions

Installation

Owner

Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

WebScrapping Project - G1 Latest News

Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

Automated data scraper for Thailand COVID-19 data

A tool to easily scrape youtube data using the Google API

Dude is a very simple framework for writing web scrapers using Python decorators

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

Python script who crawl first shodan page and check DBLTEK vulnerability

12306抢票脚本

京东云无线宝积分推送，支持查看多设备积分使用情况

抖音批量下载用户所有无水印视频

An Web Scraping API for MDL(My Drama List) for Python.

Screen scraping and web crawling framework

原神爬虫抓取原神界面圣遗物信息

🤖 Threaded Scraper to get discord servers from disboard.org written in python3

河南工业大学完美校园自动校外打卡

A web crawler for recording posts in "sina weibo"

Related tags

Overview

Web Crawler for "sina weibo"

Introduction

Functions

Installation

Owner

Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

WebScrapping Project - G1 Latest News

Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

Automated data scraper for Thailand COVID-19 data

A tool to easily scrape youtube data using the Google API

Dude is a very simple framework for writing web scrapers using Python decorators

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

Python script who crawl first shodan page and check DBLTEK vulnerability

12306抢票脚本

京东云无线宝积分推送，支持查看多设备积分使用情况

抖音批量下载用户所有无水印视频

An Web Scraping API for MDL(My Drama List) for Python.

Screen scraping and web crawling framework

原神爬虫 抓取原神界面圣遗物信息

🤖 Threaded Scraper to get discord servers from disboard.org written in python3

河南工业大学 完美校园 自动校外打卡

原神爬虫抓取原神界面圣遗物信息

河南工业大学完美校园自动校外打卡