爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

Overview

lxSpider

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说网站、招标采购网》

简介

  • csdn csdn
  • 时光荏苒,记不清写了多少案例了。作者文章发布在csdn,代码随后往github上更新。csdn部分文章为收费案例,合理订阅。

声明

  • 本库以教学为基准、本库提供的可操作性不得用于任何商业用途和违法违规场景。

  • 作者对任何原因在使用本库中提供的代码和策略时可能对用户自己或他人造成的任何形式的损失和伤害不承担责任。

  • 因本库引起的或与之有关的任何争议,各方应友好协商解决,协商不成的任何后果与作者无关。


专栏

网络爬虫基础 : 适合有python语法基础 准备学爬虫的同学

web逆向基础 : 有爬虫经验即可(包含猿人学爬虫题目解析)

安卓逆向基础 :工具介绍、逆向记录、案例分享

爬虫案例合集 :付费专栏、经典案例、持续更新


目录

博客

推荐

交流

avatar

You might also like...
Releases(快手弹幕采集工具)
  • 快手弹幕采集工具(Jan 30, 2021)

    使用说明:

    • 1、启动dist目录下的run.exe程序。
    • 2、填入主播uid,你的cookie,房间id
    • 3、点击启动后,等待即可,不可重复点击。
    • 4、需要确认主播当前是否还在直播。

    参数获取:

    主播uid: 浏览器上的网址最后一个参数。

    比如网址为: https://live.kuaishou.com/u/yingjia2019

    主播的uid为: yingjia2019

    你的cookie:

    • 1、打开控制台,鼠标右键点击审查元素或者按F12.
    • 2、点击控制台的Network。
    • 3、刷新页面,可已按F5刷新
    • 4、找到和主播uid一样html文件,然后点击右侧的headers
    • 5、鼠标划到最下面找到cookie一行。复制里面的did=web_xxxxxxxxxxxxxx;
    • 6、需要在软件上填入的cookie是 web_xxxxxxxxxxxxxx

    房间id:

    • 1、点击控制台的 Elements,按ctrl+F,打开搜索框。输入: live-stream-id
    • 2、复制 live-stream-id="Zo9Upaz8w90"
    • 3、要输入的房间id是 Zo9Upaz8w90

    运行时最好保持页面打开,关闭页面后过一段时间会导致cookie失效。

    此工具以学习为主,禁止滥用

    Source code(tar.gz)
    Source code(zip)
    default.rar(21.47 MB)
  • 小说下载器(Feb 2, 2021)

    简介

    1、小说下载(优势:速度快,直接从网络上搜集完整txt文件速度快) 2、在线小说爬取(优势:资源全,已上架的小说几乎都能找到)

    特别声明:

    • 本脚本仅用于测试和学习研究,禁止用于商业用途,不能保证其合法性,准确性,完整性和有效性,请根据情况自行判断。

    • 本项目内所有资源文件,禁止任何公众号、自媒体进行任何形式的转载、发布。

    • 本项目内任何脚本问题概不负责,包括但不限于由任何脚本错误导致的任何损失或损害.

    • 请勿将项目的任何内容用于商业或非法目的,否则后果自负。

    • 本项目遵循GPL-3.0 License协议,如果本特别声明与GPL-3.0 License协议有冲突之处,以本特别声明为准。

    Source code(tar.gz)
    Source code(zip)
    default.zip(44.16 MB)
Owner
lx
Every noble work is at first impossible.
lx
👁️ Tool for Data Extraction and Web Requests.

httpmapper 👁️ Project • Technologies • Installation • How it works • License Project 🚧 For educational purposes. This is a project that I developed,

15 Dec 05, 2021
Scrap-mtg-top-8 - A top 8 mtg scraper using python

Scrap-mtg-top-8 - A top 8 mtg scraper using python

1 Jan 24, 2022
爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

lxSpider 爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说网站、招标采购网》 简介: 时光荏苒,记不清写了多少案例了。

lx 793 Jan 05, 2023
一个m3u8视频流下载脚本

一个Python的m3u8流视频下载脚本 介绍 m3u8流视频日益常见,目前好用的下载器也有很多,我把之前自己写的一个小脚本分享出来,供广大网友使用。写此程序的目的在于给视频下载爱好者提供一个下载样例,可直接调用,勿再重复造轮子。 使用方法 在python中直接运行程序或进行外部调用 import

Nchu 0 Oct 10, 2021
Find papers by keywords and venues. Then download it automatically

paper finder Find papers by keywords and venues. Then download it automatically. How to use this? Search CLI python search.py -k "knowledge tracing,kn

Jiahao Chen (TabChen) 2 Dec 15, 2022
抢京东茅台脚本,定时自动触发,自动预约,自动停止

jd_maotai 抢京东茅台脚本,定时自动触发,自动预约,自动停止 小白信用 99.6,暂时还没抢到过,朋友 80 多抢到了一瓶,所以我感觉是跟信用分没啥关系,完全是看运气的。

Aruelius.L 117 Dec 22, 2022
Scraping web pages to get data

Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4

Soccer Project 2 Nov 01, 2021
Scrape all the media from an OnlyFans account - Updated regularly

Scrape all the media from an OnlyFans account - Updated regularly

CRIMINAL 3.2k Dec 29, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 08, 2023
A simple proxy scraper that utilizes the requests module in python.

Proxy Scraper A simple proxy scraper that utilizes the requests module in python. Usage Depending on your python installation your commands may vary.

3 Sep 08, 2021
Discord webhook spammer with proxy support and proxy scraper

Discord webhook spammer with proxy support and proxy scraper

3 Feb 27, 2022
A high-level distributed crawling framework.

Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structur

Xuye (Chris) Qin 1.5k Dec 24, 2022
An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.

An introduction to free, automated web scraping with GitHub’s powerful new Actions framework Published at palewi.re/docs/first-github-scraper/ Contrib

Ben Welsh 15 Nov 24, 2022
Python Web Scrapper Project

Web Scrapper Projeto desenvolvido em python, sobre tudo com Selenium, BeautifulSoup e Pandas é um web scrapper que puxa uma tabela com as principais e

Jordan Ítalo Amaral 2 Jan 04, 2022
Create crawler get some new products with maximum discount in banimode website

crawler-banimode create crawler and get some new products with maximum discount in banimode website. این پروژه کوچک جهت یادگیری و کار با ابزار سلنیوم

nourollah rezaei 2 Feb 17, 2022
Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit

wallstreetbets-tracker Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit.

91 Dec 08, 2022
Demonstration on how to use async python to control multiple playwright browsers for web-scraping

Playwright Browser Pool This example illustrates how it's possible to use a pool of browsers to retrieve page urls in a single asynchronous process. i

Bernardas Ališauskas 8 Oct 27, 2022
Bulk download tool for the MyMedia platform

MyMedia Bulk Content Downloader This is a bulk download tool for the MyMedia platform. USE ONLY WHERE ALLOWED BY THE COPYRIGHT OWNER. NOT AFFILIATED W

Ege Feyzioglu 3 Oct 14, 2022
A Python library for automating interaction with websites.

Home page https://mechanicalsoup.readthedocs.io/ Overview A Python library for automating interaction with websites. MechanicalSoup automatically stor

4.3k Jan 07, 2023
This script is intended to crawl license information of repositories through the GitHub API.

GithubLicenseCrawler This script is intended to crawl license information of repositories through the GitHub API. Taking a csv file with requirements.

schutera 4 Oct 25, 2022