Block fingerprinting for the beacon chain, for client identification & client diversity metrics

Overview

blockprint

This is a repository for discussion and development of tools for Ethereum block fingerprinting.

The primary aim is to measure beacon chain client diversity using on-chain data, as described in this tweet:

https://twitter.com/sproulM_/status/1440512518242197516

The latest estimate using the improved k-NN classifier for slots 2048001 to 2164916 is:

Getting Started

The raw data for block fingerprinting needs to be sourced from Lighthouse's block_rewards API.

This is a new API that is currently only available on the block-rewards-api branch, i.e. this pull request: https://github.com/sigp/lighthouse/pull/2628

Lighthouse can be built from source by following the instructions here.

VirtualEnv

All Python commands should be run from a virtualenv with the dependencies from requirements.txt installed.

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

k-NN Classifier

The best classifier implemented so far is a k-nearest neighbours classifier in knn_classifier.py.

It requires a directory of structered training data to run, and can be used either via a small API server, or in batch mode.

You can download a large (886M) training data set here.

To run in batch mode against a directory of JSON batches (individual files downloaded from LH), use this command:

./knn_classifier.py training_data_proc data_to_classify

Expected output is:

classifier score: 0.9886800869904645
classifying rewards from file slot_2048001_to_2050048.json
total blocks processed: 2032
Lighthouse,0.2072
Nimbus or Prysm,0.002
Nimbus or Teku,0.0025
Prysm,0.6339
Prysm or Teku,0.0241
Teku,0.1304

Training the Classifier

The classifier is trained from a directory of reward batches. You can fetch batches with the load_blocks.py script by providing a start slot, end slot and output directory:

./load_blocks.py 2048001 2048032 testdata

The directory testdata now contains 1 or more files of the form slot_X_to_Y.json downloaded from Lighthouse.

To train the classifier on this data, use the prepare_training_data.py script:

./prepare_training_data.py testdata testdata_proc

This will read files from testdata and write the graffiti-classified training data to testdata_proc, which is structured as directories of single block reward files for each client.

$ tree testdata_proc
testdata_proc
├── Lighthouse
│   ├── 0x03ae60212c73bc2d09dd3a7269f042782ab0c7a64e8202c316cbcaf62f42b942.json
│   └── 0x5e0872a64ea6165e87bc7e698795cb3928484e01ffdb49ebaa5b95e20bdb392c.json
├── Nimbus
│   └── 0x0a90585b2a2572305db37ef332cb3cbb768eba08ad1396f82b795876359fc8fb.json
├── Prysm
│   └── 0x0a16c9a66800bd65d997db19669439281764d541ca89c15a4a10fc1782d94b1c.json
└── Teku
    ├── 0x09d60a130334aa3b9b669bf588396a007e9192de002ce66f55e5a28309b9d0d3.json
    ├── 0x421a91ebdb650671e552ce3491928d8f78e04c7c9cb75e885df90e1593ca54d6.json
    └── 0x7fedb0da9699c93ce66966555c6719e1159ae7b3220c7053a08c8f50e2f3f56f.json

You can then use this directory as the first argument to ./knn_classifier.py.

Classifier API

With pre-processed training data installed in ./training_data_proc, you can host a classification API server like this:

gunicorn --reload api_server --timeout 1800

It will take a few minutes to start-up while it loads all of the training data into memory.

Initialising classifier, this could take a moment...
Start-up complete, classifier score is 0.9886800869904645

Once it has started up, you can make POST requests to the /classify endpoint containing a single JSON-encoded block reward. There is an example input file in examples.

curl -s -X POST -H "Content-Type: application/json" --data @examples/single_teku_block.json "http://localhost:8000/classify"

The response is of the following form:

{
  "block_root": "0x421a91ebdb650671e552ce3491928d8f78e04c7c9cb75e885df90e1593ca54d6",
  "best_guess_single": "Teku",
  "best_guess_multi": "Teku",
  "probability_map": {
    "Lighthouse": 0.0,
    "Nimbus": 0.0,
    "Prysm": 0.0,
    "Teku": 1.0
  }
}
  • best_guess_single is the single client that the classifier deemed most likely to have proposed this block.
  • best_guess_multi is a list of 1-2 client guesses. If the classifier is more than 95% sure of a single client then the multi guess will be the same as best_guess_single. Otherwise it will be a string of the form "Lighthouse or Teku" with 2 clients in lexicographic order. 3 client splits are never returned.
  • probability_map is a map from each known client label to the probability that the given block was proposed by that client.

TODO

  • Improve the classification algorithm using better stats or machine learning (done, k-NN).
  • Decide on data representations and APIs for presenting data to a frontend (done).
  • Implement a web backend for the above API (done).
  • Polish and improve all of the above.
Owner
Sigma Prime
Blockchain & Information Security Services
Sigma Prime
A webapp for taking fast notes, designed for business, school, and collaboration with groups.

JOTS Journal of the Session A webapp for taking fast notes, designed for business, school, and collaboration with groups.

Zebadiah S. Taylor 2 Jun 10, 2022
Customisable coding font with alternates, ligatures and contextual positioning

Guide Ligature Support Links Log License Guide Live Preview + Download larsenwork.com/monoid Install Quit your editor/program. Unzip and open the fold

Andreas Larsen 7.6k Dec 30, 2022
The blancmange curve can be visually built up out of triangle wave functions if the infinite sum is approximated by finite sums of the first few terms.

Blancmange-curve The blancmange curve can be visually built up out of triangle wave functions if the infinite sum is approximated by finite sums of th

Shankar Mahadevan L 1 Nov 30, 2021
A curses based mpd client with basic functionality and album art.

Miniplayer A curses based mpd client with basic functionality and album art. After installation, the player can be opened from the terminal with minip

Tristan Ferrua 102 Dec 24, 2022
The tool helps to find hidden parameters that can be vulnerable or can reveal interesting functionality that other hunters miss.

The tool helps to find hidden parameters that can be vulnerable or can reveal interesting functionality that other hunters miss. Greater accuracy is achieved thanks to the line-by-line comparison of

197 Nov 14, 2022
Live tracking, flight database and competition framework

SkyLines SkyLines is a web platform where pilots can share their flights with others after, or even during flight via live tracking. SkyLines is a sor

SkyLines 367 Dec 27, 2022
Weakly-Divisable - Takes an interger and seee if it is weakly divisible by seven

Weakly Divisble Project by Diana Arce-Hernandez, Ryan McAlpine, and Rommel Ravan

Diana Arce-Hernandez 1 Jan 12, 2022
Writeup of NilbinSec's participation in the Winja CTF for c0c0n 2021

Winja-CTF-c0c0n-2021-Writeup NilbinSec's participation in the Winja CTF for c0c0n 2021 This repo covers NilbinSec's participation in the Winja CTF dur

1 Nov 15, 2021
Python code to control laboratory hardware and perform Bayesian reaction optimization on the MIT Make-It system for chemical synthesis

Description This repository contains code accompanying the following paper on the Make-It robotic flow chemistry platform developed by the Jensen Rese

Anirudh Nambiar 11 Dec 10, 2022
The only purpose of a byte-sized application is to help you create .desktop entry files for downloaded applications.

Turtle 🐢 The only purpose of a byte-sized application is to help you create .desktop entry files for downloaded applications. As of usual with elemen

TenderOwl 14 Dec 29, 2022
Astroquery is an astropy affiliated package that contains a collection of tools to access online Astronomical data.

Astroquery is an astropy affiliated package that contains a collection of tools to access online Astronomical data.

The Astropy Project 631 Jan 05, 2023
8 Nov 04, 2022
This tool don't used illegal ativity

ETHICALTOOL This tool for only educational purposes don't used illegal ativity @onlinehacking this tool for pkg update && pkg upgrade && pkg install g

Mrkarthick 4 Dec 23, 2021
Simplest way to find Appointments in Bürgeramt Berlin, Not over engineered.

Simplest way to find Appointments in Bürgeramt Berlin, Not over engineered. Der einfachste Weg, Termine im Bürgeramt Berlin zu finden, ohne viel Schnickschnack.

Jannis 8 Nov 25, 2022
A Company Management System For Python

campany-management Getting started To make it easy for you to get started with GitLab, here's a list of recommended next steps. Already a pro? Just ed

hatice akpınar 3 Aug 29, 2022
Extremely unfinished animation toolset for Blender 3.

AbraTools Alpha IMPORTANT: Code is a mess. Be careful using it in production. Bug reports, feature requests and PRs are appreciated. Download AbraTool

Abra 15 Dec 17, 2022
Swubcase - The shitty programming language

What is Swubcase? Swubcase is easy-to-use programming language that can fuck you

5 Jun 19, 2022
banking system with python, beginner friendly, preadvanced level

banking-system-python banking system with python, beginner friendly, preadvanced level Used topics Functions else/if/elif dicts methods parameters hol

Razi Falah 1 Feb 03, 2022
データサイエンスチャレンジ2021 サンプル

データサイエンスチャレンジ2021 サンプル 概要 線形補間と Catmull–Rom Spline 補間のサンプル Python スクリプトです。 データサイエンスチャレンジ2021の出題意図としましては、訓練用データ(train.csv)から機械学習モデルを作成して、そのモデルに推論させてモーシ

Bandai Namco Research Inc. 5 Oct 17, 2022