Extracts data from the database for a graph-node and stores it in parquet files

Overview

subgraph-extractor

Extracts data from the database for a graph-node and stores it in parquet files

Installation

For developing, it's recommended to use conda to create an environment.

Create one with python 3.9

conda create --name subgraph-extractor python=3.9

Now activate it

conda activate subgraph-extractor

Install the dev packages (note there is no space after the .)

pip install -e .[dev]

Use

Now you can use the main entrypoint, see help for more details

subgraph_extractor --help

Creating a config files

The easiest way to start is to use the interactive subgraph config generator.

Start by launching the subgraph config generator with the location you want to write the config file to.

subgraph_config_generator --config-location subgraph_config.yaml

It will default to using a local graph-node with default username & password (postgresql://graph-node:[email protected]:5432/graph-node) If you are connecting to something else you need to specify the database connection string with --database-string.

You will then be asked to select:

  • The relevant subgraph
  • From the subgraph, which tables to extract (multi-select)
  • For each table, which column to partition on (this is typically the block number or timestamp)
  • Any numeric columns that require mapping to another type * see note below

Numeric column mappings

Uint256 is a common data type in contracts but rare in most data processing tools. The graph node creates a Postgres Numeric column for any field marked as a BigInt as it is capable of accurately storing uint256s (a common data type in solidity).

However, many downstream tools cannot handle these as numbers.

By default, these columns will be exported as bytes - a lossless representation but one that is not as usable for sums, averages, etc. This is fine for some data, such as addresses or where the field is used to pack data (e.g. the tokenIds for decentraland).

For other use cases, the data must be converted to another type. In the config file, you can specify numeric columns that need to be mapped to another type:

column_mappings:
  my_original_column_name:
    my_new_column_name:
      type: uint64

However, if the conversion does not work (e.g. the number is too large), the extraction will stop with an error. This is fine for cases where you know the range (e.g. timestamp or block number). For other cases you can specify a maximum value, default and a column to store whether the row was at most the maximum value:

column_mappings:
  my_original_column_name:
    my_new_column_name:
      type: uint64
      max_value: 18446744073709551615
      default: 0
      validity_column: new_new_column_name_valid

If the number is over 18446744073709551615, there will be a 0 stored in the column my_new_column_name and FALSE stored in new_new_column_name_valid.

If your numbers are too large but can be safely lowered for your usecase (e.g. converting from wei to gwei) you can provide a downscale value:

column_mappings:
  transfer_fee_wei:
    transfer_fee_gwei:
      downscale: 1000000000
      type: uint64
      max_value: 18446744073709551615
      default: 0
      validity_column: transfer_fee_gwei_valid

This will perform an integer division (divide and floor) the original value. WARNING this is a lossy conversion.

You may have as many mappings for a single column as you want, and the original will always be present as bytes.

The following numeric types are allowed:

  • int8, int16, int32, int64
  • uint8, uint16, uint32, uint64
  • float32, float64
  • Numeric38 (this is a numeric/Decimal column with 38 digits of precision)

Contributing

Please format everything with black and isort

black . && isort --profile=black .
Owner
Cardstack
Experience Web 3.0.
Cardstack
Official Implementation for the paper DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification

DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification Official Implementation for the pape

Anh M. Nguyen 36 Dec 28, 2022
Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life. Engineering FSI industry (Financial

Descript 150 Dec 06, 2022
Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Universal Adversarial Triggers for Attacking and Analyzing NLP This is the official code for the EMNLP 2019 paper, Universal Adversarial Triggers for

Eric Wallace 248 Dec 17, 2022
Python scripts for performing stereo depth estimation using the HITNET Tensorflow model.

HITNET-Stereo-Depth-estimation Python scripts for performing stereo depth estimation using the HITNET Tensorflow model from Google Research. Stereo de

Ibai Gorordo 76 Jan 02, 2023
Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

Poplar Implementation of Bundle Adjustment using Gaussian Belief Propagation on Graphcore's IPU Implementation of CVPR 2020 paper: Bundle Adjustment o

Joe Ortiz 34 Dec 05, 2022
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

NÜWA - Pytorch (wip) Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch. This repository will be popul

Phil Wang 463 Dec 28, 2022
Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

To run the code Unzip the package to your local directory; Run 'pip install -r requirements.txt' to download required packages; Open file ~/nips_code/

32 Dec 26, 2022
Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning

isvd Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning If you find this code useful, you may cite us as: @inprocee

Sami Abu-El-Haija 16 Jan 08, 2023
Txt2Xml tool will help you convert from txt COCO format to VOC xml format in Object Detection Problem.

TXT 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Txt2Xml too

Nguyễn Trường Lâu 4 Nov 24, 2022
Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

ResMLP - Pytorch Implementation of ResMLP, an all MLP solution to image classification out of Facebook AI, in Pytorch Install $ pip install res-mlp-py

Phil Wang 178 Dec 02, 2022
Just Randoms Cats with python

Random-Cat Just Randoms Cats with python.

OriCode 2 Dec 21, 2021
Code release for NeRF (Neural Radiance Fields)

NeRF: Neural Radiance Fields Project Page | Video | Paper | Data Tensorflow implementation of optimizing a neural representation for a single scene an

6.5k Jan 01, 2023
Selfplay In MultiPlayer Environments

This project allows you to train AI agents on custom-built multiplayer environments, through self-play reinforcement learning.

200 Jan 08, 2023
IhoneyBakFileScan Modify - 批量网站备份文件扫描器,增加文件规则,优化内存占用

ihoneyBakFileScan_Modify 批量网站备份文件泄露扫描工具 2022.2.8 添加、修改内容 增加备份文件fuzz规则 修改备份文件大小判断

VMsec 220 Jan 05, 2023
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 28 Nov 25, 2022
Portfolio asset allocation strategies: from Markowitz to RNNs

Portfolio asset allocation strategies: from Markowitz to RNNs Research project to explore different approaches for optimal portfolio allocation starti

Luigi Filippo Chiara 1 Feb 05, 2022
Model Zoo of BDD100K Dataset

Model Zoo of BDD100K Dataset

ETH VIS Group 200 Dec 27, 2022
TLXZoo - Pre-trained models based on TensorLayerX

Pre-trained models based on TensorLayerX. TensorLayerX is a multi-backend AI fra

TensorLayer Community 13 Dec 07, 2022
COVID-Net Open Source Initiative

The COVID-Net models provided here are intended to be used as reference models that can be built upon and enhanced as new data becomes available

Linda Wang 1.1k Dec 26, 2022
Deep Reinforcement Learning for Multiplayer Online Battle Arena

MOBA_RL Deep Reinforcement Learning for Multiplayer Online Battle Arena Prerequisite Python 3 gym-derk Tensorflow 2.4.1 Dotaservice of TimZaman Seed R

Dohyeong Kim 32 Dec 18, 2022