A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

Last update: Dec 31, 2022

Overview

🕳️ CygnusX1

Code by Trong-Dat Ngo.

Overviews

🕳️ CygnusX1 is a multithreaded tool 🛠️ , used to search and download images from popular search engines 🔎 . It is straightforward to set up and run!

Key features

🥰 No knowledge is required to get up and to run.
🚀 Download image using customizable number of threads.
⛏️ Crawl all possible images (search results and recommendations).

Installation

This repository is tested on Python 3.6+ and PyTorch selenium 3.141.0+, as well as it works fine on macOS, Windows, Linux.

You should setup and run 🕳️ CygnusX1 in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide here.

First, create a virtual environment with the version of Python you're going to use and activate it. (Can be omitted if you want to set up directly on the OS environment)

source venv/bin/activate

Then download 🕳️ CygnusX1 from Github:

git clone https://github.com/dat821168/CygnusX1.git

Finally install dependencies in requirements.txt:

pip install -r requirements.txt

Run

Use run.py to start the script:

python run.py  --keywords "keyword 1, keyword 2" --workers 8 --use_suggestions --headless

Argument details:

--keywords: Indicate the keywords/keyphrases you want to search. For multiple keywords, separate them with commas.
--out_dir: Path where to save results. Default = './IMAGES'.
--workers: The maximum number of workers used to crawl image. Default = 2.
--use_suggestions: Crawl search engine suggestions/recommendations. Default = False.
--headless: Hide browser during scraping. Default = False.

A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

Related tags

Overview

🕳️ CygnusX1

Overviews

Key features

Installation

Run

Future Releases

References

Owner

DatNgo

OSTA web scraper, for checking the status of school buses in Ottawa

让中国用户使用git从github下载的速度提高1000倍!

Example of scraping a paginated API endpoint and dumping the data into a DB

Binance Smart Chain Contract Scraper + Contract Evaluator

This is a webscraper for a specific website

A Spider for BiliBili comments with a simple API server.

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

A simple app to scrap data from Twitter.

Fundamentus scrapy

Instagram profile scrapper with python

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

API to parse tibia.com content into python objects.

Web Content Retrieval for Humans™

WebScraper - A script that prints out a list of all EXTERNAL references in the HTML response to an HTTP/S request

Scraping Thailand COVID-19 data from the DDC's tableau dashboard

TarkovScrappy - A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov!

爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

A simple django-rest-framework api using web scraping

热搜榜-python爬虫+正则re+beautifulsoup+xpath

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.