A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

    About 千葉県の地域別の詳細感染者統計(Excelファイル) をCSVに変換し、かつ地域別の日時感染者集計値を出力するスクリプトです。 Requirement POSIX互換なシェル, e.g. GNU Bash (1) curl (1) python = 3.8 pandas = 1.1.

    Conv4Japan 1 Nov 29, 2021
    An application that on a given url, crowls a web page and gets all words, sorts and counts them.

    Web-Scrapping-1 An application that on a given url, crowls a web page and gets all words, sorts and counts them. Installation Using the package manage

    adriano atambo 1 Jan 16, 2022
    Proxy scraper. Format: IP | PORT | COUNTRY | TYPE

    proxy scraper 🔎 Installation: git clone https://github.com/ebankoff/proxy_scraper Required pip libraries (pip install library name): lxml beautifulso

    Eban'ko 19 Dec 07, 2022
    Facebook Group Scraping Using Beautiful Soup & Selenium

    Extract Facebook group posts that are related to a specific topic and write them to a .json file.

    Fatima Ghadieh 14 Aug 12, 2022
    Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN

    Lexile-Atos-Scraper Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN You will need to install the chrome webdriver if you have n

    1 Feb 11, 2022
    Basic-html-scraper - A complete how to of web scraping with Python for beginners

    basic-html-scraper Code from YT Video This video includes a complete how to of w

    John 12 Oct 22, 2022
    simple http & https proxy scraper and checker

    simple http & https proxy scraper and checker

    Neospace 11 Nov 15, 2021
    Python scraper to check for earlier appointments in Clalit Health Services

    clalit-appt-checker Python scraper to check for earlier appointments in Clalit Health Services Some background If you ever needed to schedule a doctor

    Dekel 16 Sep 17, 2022
    A web scraper which checks price of a product regularly and sends price alerts by email if price reduces.

    Amazon-Web-Scarper Created a web scraper using simple functions to check price of a product on amazon (can be duplicated to check price at other marke

    Swaroop Todankar 1 Jan 17, 2022
    This is a module that I had created along with my friend. It's a basic web scraping module

    QuickInfo PYPI link : https://pypi.org/project/quickinfo/ This is the library that you've all been searching for, it's built for developers and allows

    OneBit 2 Dec 13, 2021
    Scraping Top Repositories for Topics on GitHub,

    0.-Webscrapping-using-python Scraping Top Repositories for Topics on GitHub, Web scraping is the process of extracting and parsing data from websites

    Dev Aravind D Satprem 2 Mar 18, 2022
    Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

    Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

    Joseph Lai 543 Jan 03, 2023
    A python tool to scrape NFT's off of OpenSea

    Right Click Bot A script to download NFT PNG's from OpenSea. All the NFT's you could ever want, no blockchain, for free. Usage Must Use Python 3! Auto

    15 Jul 16, 2022
    Haphazard scripts for scraping bitcoin/bitcoin data from GitHub

    This is a quick-and-dirty tool used to scrape bitcoin/bitcoin pull request and commentary data. Each output/pr number folder contains comments.json:

    James O'Beirne 8 Oct 12, 2022
    Fundamentus scrapy

    Fundamentus_scrapy Baixa informacões que os outros scrapys do fundamentus não realizam. Para iniciar (python main.py), sera criado um arquivo chamado

    Guilherme Silva Uchoa 1 Oct 24, 2021
    Simple tool to scrape and download cross country ski timings and results from live.skidor.com

    LiveSkidorDownload Simple tool to scrape and download cross country ski timings and results from live.skidor.com Usage: Put the python file in a dedic

    0 Jan 07, 2022
    SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

    SearchifyX SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features. SearchifyX lets you

    28 Dec 20, 2022
    A Telegram crawler to search groups and channels automatically and collect any type of data from them.

    Introduction This is a crawler I wrote in Python using the APIs of Telethon months ago. This tool was not intended to be publicly available for a numb

    39 Dec 28, 2022
    Searching info from Google using Python Scrapy

    Python-Search-Engine-Scrapy || Python-爬虫-索引/利用爬虫获取谷歌信息**/ Searching info from Google using Python Scrapy /* 利用 PYTHON 爬虫获取天气信息,以及城市信息和资料**/ translatio

    HONGVVENG 1 Jan 06, 2022
    Web3 Pancakeswap Sniper bot written in python3

    Pancakeswap_BSC_Sniper_Bot Web3 Pancakeswap Sniper bot written in python3, Please note the license conditions! The first Binance Smart Chain sniper bo

    Treading-Tigers 295 Dec 31, 2022