A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    Scrapy-soccer-games - Scraping information about soccer games from a few websites

    scrapy-soccer-games Esse projeto tem por finalidade pegar informação de tabela d

    Caio Alves 2 Jul 20, 2022
    High available distributed ip proxy pool, powerd by Scrapy and Redis

    高可用IP代理池 README | 中文文档 本项目所采集的IP资源都来自互联网,愿景是为大型爬虫项目提供一个高可用低延迟的高匿IP代理池。 项目亮点 代理来源丰富 代理抓取提取精准 代理校验严格合理 监控完备,鲁棒性强 架构灵活,便于扩展 各个组件分布式部署 快速开始 注意,代码请在release

    SpiderClub 5.2k Jan 03, 2023
    Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a bot

    Aliexpress to telegram post Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a b

    Fernando 6 Dec 06, 2022
    对于有验证码的站点爆破,用于安全合法测试

    使用方法 python3 main.py + 配置好的文件 python3 main.py Verify.json python3 main.py NoVerify.json 以上分别对应有验证码的demo和无验证码的demo Tips: 你可以以域名作为配置文件名字加载:python3 main

    47 Nov 09, 2022
    哔哩哔哩爬取器:以个人为中心

    Open Bilibili Crawer 哔哩哔哩是一个信息非常丰富的社交平台,我们基于此构造社交网络。在该网络中,节点包括用户(up主),以及视频、专栏等创作产物;关系包括:用户之间,包括关注关系(following/follower),回复关系(评论区),转发关系(对视频or动态转发);用户对创

    Boshen Shi 3 Oct 21, 2021
    A Pixiv web crawler module

    Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co

    Uzuki 1 Nov 14, 2021
    Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

    Instagram_scrapper This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or exce

    Lakhdar Belkharroubi 5 Oct 17, 2022
    New World Market Scraper

    Bean Seller A New Worlds market scraper. Deployment This must be installed on Windows as it uses the Windows api to do its stuff Install Prerequisites

    4 Sep 21, 2022
    A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

    cybernews A package that provides you Latest Cyber/Hacker News from website using Web-Scraping. Latest Cyber/Hacker News Using Webscraping Developed b

    Hitesh Rana 4 Jun 02, 2022
    Simply scrape / download all the media from an fansly account.

    Simply scrape / download all the media from an fansly account. Providing updates as long as its continuously gaining popularity, so hit the ⭐ button!

    Mika C. 334 Jan 01, 2023
    Python Web Scrapper Project

    Web Scrapper Projeto desenvolvido em python, sobre tudo com Selenium, BeautifulSoup e Pandas é um web scrapper que puxa uma tabela com as principais e

    Jordan Ítalo Amaral 2 Jan 04, 2022
    PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

    PaperRobot PaperRobot 是一个论文抓取工具,可以快速批量下载大量论文,方便后期进行持续的论文管理与学习。 PaperRobot通过多个接口抓取论文,目前抓取成功率维持在90%以上。通过配置Config文件,可以抓取任意计算机领域相关会议的论文。 Installation Down

    moxiaoxi 47 Nov 23, 2022
    Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

    Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc).

    Amit 6 Aug 26, 2022
    Auto Join: A GitHub action script to automatically invite everyone to the organization who star your repository.

    Auto Invite To The Organization By Star A GitHub Action script to automatically invite everyone to your organization that stars your repository. What

    Max Base 11 Dec 11, 2022
    A Python module to bypass Cloudflare's anti-bot page.

    cloudscraper A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests.

    VeNoMouS 2.6k Dec 31, 2022
    This was supposed to be a web scraping project, but somehow I've turned it into a spamming project

    Introduction This was supposed to be a web scraping project, but somehow I've turned it into a spamming project.

    Boss Perry (Pez) 1 Jan 23, 2022
    An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post

    Autoscraper-n-blogger An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post and notifies via Telegram bot

    GOKUL A.P 13 Dec 21, 2022
    Scrape Twitter for Tweets

    Backers Thank you to all our backers! 🙏 [Become a backer] Sponsors Support this project by becoming a sponsor. Your logo will show up here with a lin

    Ahmet Taspinar 2.2k Jan 05, 2023
    A simple app to scrap data from Twitter.

    Twitter-Scraping-App A simple app to scrap data from Twitter. Available Features Search query. Select number of data you want to fetch from twitter. C

    Davis David 2 Oct 31, 2022
    PS5 bot to find a console in france for chrismas 🎄🎅🏻 NOT FOR SCALPERS

    Une PS5 pour Noël Python + Chrome --headless = une PS5 pour noël MacOS Installer chrome Tweaker le .yaml pour la listes sites a scrap et les criteres

    Olivier Giniaux 3 Feb 13, 2022