Similar looking domain detection using python fuzzywuzzy

Overview

Similar-looking-domain-detection-using-python-fuzzywuzzy

Major cause of phishing and BEC incident is similar looking domain, if you detect it early, you can prevent incidents early, python fuzzywuzzy module let you do that and here is the process.

By statistics every day thousands of domains are registered, some are use for legit purpose and some are not. BEC incidents incresing every day and cost millions to businesses, the core of BEC is spoofed email that looks similar to your business email. Because of these similar looking domain we fall pray to BEC incidents. Sometimes we end up submitting our credential when we received any email having link that looks like genuine website. e.g. microsoft.com vs micr0soft.com.

How can we detect/prevent such incident?

In python you have module named "fuzzywuzzy" that looks for similarity in strings and gives score of how similar strings are, like 90% match, 66% match. Use that to look for simiar domains

  1. Gather data (SIEM) having domain related information e.g. Proxy logs, DNS logs, Mail logs.
  2. Have list of domains related to your business ( your owned domain, list of vendor domains with whom you carry out business )
  3. Now run fuzz module against this data and check ratio which is more than 50% ( e.g. given below )
  4. Do the analysis of domains (check whois data) which looks similar to your domain, if genuine add to list gathered in Step 2
  5. If domain is not genuine, start digging more on that domain, like any email received(mail logs), any user visited the domain(proxy logs)
  6. You can run such checks on hourly, daily basis.... thats it.

Here are coule of examples.

  1. Basic example

    from fuzzywuzzy import fuzz
    a = "microsoft.com"
    b = "micros0ft.com"
    print("Match ratio is ", str(fuzz.ratio(a, b)), "%") // fuzz.ration(a,b) function gives you match score

  2. Working code

    from fuzzywuzzy import fuzz

    dns_data = open(r'/home/user/Desktop/BEC/your_domain.txt','r') # List of genuine domains owned by you
    output = open(r'/home/user/Desktop/BEC/output.txt','w') # Output file

    for dns in dns_data:

    domain = open(r'/home/user/Desktop/BEC/domain-names-data.txt','r') # domain data gathered from proxy/dns/mail logs
    for site in domain:

    if ( fuzz.ratio(site.rstrip(),dns.rstrip()) > 80 ):

    output.write("Match ratio is: \t" + dns.rstrip() + "\t" + site.rstrip() + "\t" + str(fuzz.ratio(site.rstrip(),dns.rstrip())))
    output.write("\n")

If you have access to whois database then you can run this code against newly registered domain everyday and probably you can get the result early!!!

I have run this code against newly registered doamin on 3rd Nov. Legit domains considered are top 1000 domains. Results are amazing as to how many similar looking domains are registered everyday and no wonder we receive lot of offerers from amzon apples :) Check out Output.txt file

Feel free to share your thoughts!!!

DiddiParser 2: The DiddiScript parser.

DiddiParser 2 The DiddiScript parser, written in Python. Installation DiddiParser2 can be installed via pip: pip install diddiparser2 Usage DiddiPars

Diego Ramirez 3 Dec 28, 2022
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Lark - Parsing Library & Toolkit 3.5k Jan 05, 2023
Extract the download URL from OneDrive or SharePoint share link and push it to aria2

OneDriveShareLinkPushAria2 Extract the download URL from OneDrive or SharePoint share link and push it to aria2 从OneDrive或SharePoint共享链接提取下载URL并将其推送到a

高玩梁 262 Jan 08, 2023
Python script to launch burp scans automatically

SimpleAutoBurp Python script that takes a config.json file as config and uses Burp Suite Pro to scan a list of websites.

Adan Álvarez 26 Jul 18, 2022
A python module to manipulate XCode projects

This module can read, modify, and write a .pbxproj file from an Xcode 4+ projects. The file is usually called project.pbxproj and can be found inside the .xcodeproj bundle. Because some task cannot b

Ignacio Calderon 1.1k Jan 02, 2023
Python library to decorate and beautify strings

outputformat Python library to decorate and beautify your standard output 💖 Ins

Felipe Delestro Matos 259 Dec 13, 2022
This is a python table of data implementation with styles, colors

Table This is a python table of data implementation with styles, colors Example Table adapts to the lack of data Lambda color features Full power of l

Урядов Алексей 5 Nov 09, 2021
Convert any-bit number to decimal number and vise versa.

2deci Convert any-bit number to decimal number and vise versa. --bit n to set bit to n --exp xxx to set expression to xxx --r to run reversely (from d

3 Sep 15, 2021
This utility lets you draw using your laptop's touchpad on Linux.

FingerPaint This utility lets you draw using your laptop's touchpad on Linux. Pressing any key or clicking the touchpad will finish the drawing

Wazzaps 95 Dec 17, 2022
ticktock is a minimalist library to view Python time performance of Python code.

ticktock is a minimalist library to view Python time performance of Python code.

Victor Benichoux 30 Sep 28, 2022
Small project to interact with python, C, HTML, JavaScript, PHP.

Micro Hidroponic Small project to interact with python, C, HTML, JavaScript, PHP. Table of Contents General Info Technologies Used Screenshots Usage P

Filipe Martins 1 Nov 10, 2021
A meme error handler for python

Pwython OwO what's this? Pwython is project aiming to fill in one of the biggest problems with python, which is that it is slow lacks owoified text. N

SystematicError 23 Jan 15, 2022
Early version for manipulate Geo localization data trough API REST.

Backend para obtener los datos (beta) Descripción El servidor está diseñado para recibir y almacenar datos enviados en forma de JSON por una aplicació

Víctor Omar Vento Hernández 1 Nov 14, 2021
This repository contains scripts that help you validate QR codes.

Validation tools This repository contains scripts that help you validate QR codes. It's hacky, and a warning for Apple Silicon users: the dependencies

Ryan Barrett 8 Mar 01, 2022
Rabbito is a mini tool to find serialized objects in input values

Rabbito-ObjectFinder Rabbito is a mini tool to find serialized objects in input values What does Rabbito do Rabbito has the main object finding Serial

7 Dec 13, 2021
Color getter (including method to get random color or complementary color) made out of Python

python-color-getter Color getter (including method to get random color or complementary color) made out of Python Setup pip3 install git+https://githu

Jung Gyu Yoon 2 Sep 17, 2022
Script to decrypt / import chromium (edge/chrome) cookies

Cloonie Script to decrypt / import chromium (edge/chrome) cookies. Requirements Install the python dependencies via pip: pip install -r requirements.t

Lorenzo Bernardi 5 Sep 13, 2022
Python implementation of Gorilla time series compression

Gorilla Time Series Compression This is an implementation (with some adaptations) of the compression algorithm described in section 4.1 (Time series c

Ghiles Meddour 19 Jan 01, 2023
.bvh to .mcfunction file converter.

bvh-to-mcf .bvh file to .mcfunction converter

Hanmin Kim 28 Nov 21, 2022
Spacegit is a .git exposed finder

Spacegit Spacegit is a basic .git exposed finder Usage: You need python3 installed to run spacegit use: python3 spacegit.py (url) Disclaimer: **This i

2 Nov 30, 2021