A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    Web scraper for Zillow

    Zillow-Scraper Instructions All terminal commands are highlighted. Make sure you first have python 3 installed. You can check this by running "python

    Ali Rastegar 1 Nov 23, 2021
    A Pixiv web crawler module

    Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co

    Uzuki 1 Nov 14, 2021
    Simple tool to scrape and download cross country ski timings and results from live.skidor.com

    LiveSkidorDownload Simple tool to scrape and download cross country ski timings and results from live.skidor.com Usage: Put the python file in a dedic

    0 Jan 07, 2022
    爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

    My-Actions 个人收集并适配Github Actions的各类签到大杂烩 不要fork了 ⭐️ star就行 使用方式 新建仓库并同步代码 点击Settings - Secrets - 点击绿色按钮 (如无绿色按钮说明已激活。直接到下一步。) 新增 new secret 并设置 Secr

    280 Dec 30, 2022
    Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

    recipe-scrapers-webservice This is a wrapper for hhursev/recipe-scrapers which provides the api as a webservice, to be consumed as a microservice by o

    1 Jul 09, 2022
    A distributed crawler for weibo, building with celery and requests.

    A distributed crawler for weibo, building with celery and requests.

    SpiderClub 4.8k Jan 03, 2023
    a high-performance, lightweight and human friendly serving engine for scrapy

    a high-performance, lightweight and human friendly serving engine for scrapy

    Speakol Ads 30 Mar 01, 2022
    自动完成每日体温上报(Github Actions)

    体温上报助手 简介 每天 10:30 GMT+8 自动完成体温上报,如想修改定时运行的时间,可修改 .github/workflows/SduHealthReport.yml 中 schedule 属性。 如果当日有异常,请手动在小程序端/PC 端填写!

    Teng Zhang 23 Sep 15, 2022
    Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

    Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an

    Faeze Ghorbanpour 1 Dec 30, 2021
    Anonymously scrapes onlinesim.ru for new usable phone numbers.

    phone-scraper Anonymously scrapes onlinesim.ru for new usable phone numbers. Usage Clone the repository $ git clone https://github.com/thomasgruebl/ph

    16 Oct 08, 2022
    This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease.

    LeasePlan - Scraper This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease. It has

    Rodney 4 Nov 18, 2022
    This tool can be used to extract information from any website

    WEB-INFO- This tool can be used to extract information from any website Install Termux and run the command --- $ apt-get update $ apt-get upgrade $ pk

    1 Oct 24, 2021
    download NCERT books using scrapy

    download_ncert_books download NCERT books using scrapy Downloading Books: You can either use the spider by cloning this repo and following the instruc

    1 Dec 02, 2022
    API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

    NameMC Scrape API This is an api to scrape NameMC using message previews generated by discord. NameMC makes it a pain to scrape their website, but som

    Twilak 2 Dec 22, 2021
    一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

    QQ音乐歌词爬虫 一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件,默认去除了所有演唱会(Live)版本的歌曲。 使用方法 直接运行python run.py即可,然后输入你想获取的歌手名字,然后静静等待片刻。 output目录下保存生成的歌词和歌名文件。以周杰伦为例,会生成两

    Yang Wei 11 Jul 27, 2022
    Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data.

    Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data. Then used Yahoo Finance to get the related stock data and displayed them in the form of chart

    Samrat Mitra 3 Sep 09, 2022
    Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

    Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

    Joseph Lai 543 Jan 03, 2023
    IGLS - Instagram Like Scraper CLI tool

    IGLS - Instagram Like Scraper It's a web scraping command line tool based on python and selenium. Description This is a trial tool for learning purpos

    Shreshth Goyal 5 Oct 29, 2021
    An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

    Social Media Scraper An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line! Go to the website » Vie

    2 Aug 03, 2022
    Simply scrape / download all the media from an fansly account.

    Simply scrape / download all the media from an fansly account. Providing updates as long as its continuously gaining popularity, so hit the ⭐ button!

    Mika C. 334 Jan 01, 2023