Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Last update: Nov 05, 2021

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

This repository provides two web crawlers to label domain names using the McAfee API (https://www.trustedsource.org/sources/index.pl) and IP reputation using the TALOS API (https://talosintelligence.com/), respectively.

Requirements

BeautifulSoup

Usage

Descriptions of the demonstration code are as follows.

To label the categories of a set of domains, put the domain list in 'data/domain_list.txt' and run 'demo_domain_label.py'. The program will label the (1) category (e.g., Malicious Sites- Parked Domain) as well as (2) risk level (e.g., High Risk) of each domain (using the McAfee API) and save the results in 'res/domain_labels.txt'. When the program continuously outputs ''-Retry-'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the domains already labeled and continue to label the rest domains.
To label the reputation of a set of IP addresses, put the IP list in 'data/IP_list.txt' and run 'demo_IP_label.py'. The program will label the (1) email reputation as well as (2) web reputation (with 3 levels of Poor, Neutral, and Good) and save the results in 'res/IP_labels.txt'. When the program continuously outputs ''None'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the IPs already labeled and continue to label the rest IPs.
An example domain name list (with 21,820 effective second-level domains) and an example IP list (with 67,751 IP addresses) are given in 'data/examples/example_domain_list.txt' and 'data/examples/example_IP_list.txt', repsectively. The corresponding labeled results are saved in 'res/examples/example_domain_labels.txt' and 'res/examples/example_IP_labels.txt', respectively.

If you have questions regarding this repository, you can contact the author via [[email protected]].

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Requirements

Usage

Owner

Pseudo API for Google Trends

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

CRI Scrape is a tool for get general info about Italian Red Cross in GAIA Platform

A Python module to bypass Cloudflare's anti-bot page.

Find thumbnails and original images from URL or HTML file.

Instagram profile scrapper with python

SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

🥫 The simple, fast, and modern web scraping library

Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs.

Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

Dictionary - Application focused on word search through web scraping

A tool for scraping and organizing data from NewsBank API searches

Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Incredibly fast crawler designed for OSINT.

A Python package that scrapes Google News article data while remaining undetected by Google.

This repo has the source code for the crawler and data crawled from auto-data.net

A Python web scraper to scrape latest posts from official Coinbase's Blog.

京东茅台抢购最新优化版本，京东茅台秒杀，优化了茅台抢购进程队列