This repository contains a set of benchmarks of different implementations of Parquet (storage format) <-> Arrow (in-memory format).

Overview

Parquet benchmarks

This repository contains a set of benchmarks of different implementations of Parquet (storage format) <-> Arrow (in-memory format).

The results on Azure's Standard D4s v3 (4 vcpus, 16 GiB memory) are available here.

Read uncompressed

read uncompressed i64

read uncompressed bool

read uncompressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

read uncompressed dict utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Read compressed (snappy)

read compressed i64

read compressed bool

read compressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Write uncompressed

write uncompressed i64

write uncompressed bool

write uncompressed utf8

Write compressed (snappy)

write compressed i64

write compressed bool

write compressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Run benchmarks

To reproduce, use

python3 -m venv venv
venv/bin/pip install -U pip
venv/bin/pip install pyarrow

# create files
venv/bin/python write_parquet.py

# run benchmarks
venv/bin/python run.py

# print results to stdout as csv
venv/bin/python summarize.py

Details

The benchmark reads a single column from a file pre-loaded into memory, decompresses and deserializes the column to an arrow array.

The benchmark includes different configurations:

  • dictionary-encoded vs plain encoding
  • single page vs multiple pages
  • compressed vs uncompressed
  • different types:
    • i64
    • bool
    • utf8
Test scripts etc. for experimental rollup testing

rollup node experiments Test scripts etc. for experimental rollup testing. untested, work in progress python -m venv venv source venv/bin/activate #

Diederik Loerakker 14 Jan 25, 2022
Python Moonlight (Machine Learning) Practice

PyML Python Moonlight (Machine Learning) Practice Contents Design Documentation Prerequisites Checklist Dev Setup Testing Run Prerequisites Python 3 P

Dockerian Seattle 2 Dec 25, 2022
Scalable user load testing tool written in Python

Locust Locust is an easy to use, scriptable and scalable performance testing tool. You define the behaviour of your users in regular Python code, inst

Locust.io 20.4k Jan 04, 2023
Front End Test Automation with Pytest Framework

Front End Test Automation Framework with Pytest Installation and running instructions: 1. To install the framework on your local machine: clone the re

Sergey Kolokolov 2 Jun 17, 2022
LuluTest is a Python framework for creating automated browser tests.

LuluTest LuluTest is an open source browser automation framework using Python and Selenium. It is relatively lightweight in that it mostly provides wr

Erik Whiting 14 Sep 26, 2022
Python Webscraping using Selenium

Web Scraping with Python and Selenium The code shows how to do web scraping using Python and Selenium. We use as data the https://sbot.org.br/localize

Luís Miguel Massih Pereira 1 Dec 01, 2021
Green is a clean, colorful, fast python test runner.

Green -- A clean, colorful, fast python test runner. Features Clean - Low redundancy in output. Result statistics for each test is vertically aligned.

Nathan Stocks 756 Dec 22, 2022
Faker is a Python package that generates fake data for you.

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in yo

Daniele Faraglia 15.2k Jan 01, 2023
Statistical tests for the sequential locality of graphs

Statistical tests for the sequential locality of graphs You can assess the statistical significance of the sequential locality of an adjacency matrix

2 Nov 23, 2021
A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well

folder-automation A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well Th

Parag Jyoti Paul 31 May 28, 2021
Code for "SUGAR: Subgraph Neural Network with Reinforcement Pooling and Self-Supervised Mutual Information Mechanism"

SUGAR Code for "SUGAR: Subgraph Neural Network with Reinforcement Pooling and Self-Supervised Mutual Information Mechanism" Overview train.py: the cor

41 Nov 08, 2022
Hypothesis is a powerful, flexible, and easy to use library for property-based testing.

Hypothesis Hypothesis is a family of testing libraries which let you write tests parametrized by a source of examples. A Hypothesis implementation the

Hypothesis 6.4k Jan 05, 2023
Yet another python home automation project. Because a smart light is more than just on or off

Automate home Yet another home automation project because a smart light is more than just on or off. Overview When talking about home automation there

Maja Massarini 62 Oct 10, 2022
Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application

Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application Crawling a server-side-rendered web application is a

70 Aug 07, 2022
a plugin for py.test that changes the default look and feel of py.test (e.g. progressbar, show tests that fail instantly)

pytest-sugar pytest-sugar is a plugin for pytest that shows failures and errors instantly and shows a progress bar. Requirements You will need the fol

Teemu 963 Dec 28, 2022
Fidelipy - Semi-automated trading on fidelity.com

fidelipy fidelipy is a simple Python 3.7+ library for semi-automated trading on fidelity.com. The scope is limited to the Trade Stocks/ETFs simplified

Darik Harter 8 May 10, 2022
An AWS Pentesting tool that lets you use one-liner commands to backdoor an AWS account's resources with a rogue AWS account - or share the resources with the entire internet 😈

An AWS Pentesting tool that lets you use one-liner commands to backdoor an AWS account's resources with a rogue AWS account - or share the resources with the entire internet 😈

Brandon Galbraith 276 Mar 03, 2021
pytest plugin for a better developer experience when working with the PyTorch test suite

pytest-pytorch What is it? pytest-pytorch is a lightweight pytest-plugin that enhances the developer experience when working with the PyTorch test sui

Quansight 39 Nov 18, 2022
Silky smooth profiling for Django

Silk Silk is a live profiling and inspection tool for the Django framework. Silk intercepts and stores HTTP requests and database queries before prese

Jazzband 3.7k Jan 04, 2023
AutoExploitSwagger is an automated API security testing exploit tool that can be combined with xray, BurpSuite and other scanners.

AutoExploitSwagger is an automated API security testing exploit tool that can be combined with xray, BurpSuite and other scanners.

6 Jan 28, 2022