This repository contains a set of benchmarks of different implementations of Parquet (storage format) <-> Arrow (in-memory format).

Overview

Parquet benchmarks

This repository contains a set of benchmarks of different implementations of Parquet (storage format) <-> Arrow (in-memory format).

The results on Azure's Standard D4s v3 (4 vcpus, 16 GiB memory) are available here.

Read uncompressed

read uncompressed i64

read uncompressed bool

read uncompressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

read uncompressed dict utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Read compressed (snappy)

read compressed i64

read compressed bool

read compressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Write uncompressed

write uncompressed i64

write uncompressed bool

write uncompressed utf8

Write compressed (snappy)

write compressed i64

write compressed bool

write compressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Run benchmarks

To reproduce, use

python3 -m venv venv
venv/bin/pip install -U pip
venv/bin/pip install pyarrow

# create files
venv/bin/python write_parquet.py

# run benchmarks
venv/bin/python run.py

# print results to stdout as csv
venv/bin/python summarize.py

Details

The benchmark reads a single column from a file pre-loaded into memory, decompresses and deserializes the column to an arrow array.

The benchmark includes different configurations:

  • dictionary-encoded vs plain encoding
  • single page vs multiple pages
  • compressed vs uncompressed
  • different types:
    • i64
    • bool
    • utf8
masscan + nmap 快速端口存活检测和服务识别

masnmap masscan + nmap 快速端口存活检测和服务识别。 思路很简单,将masscan在端口探测的高速和nmap服务探测的准确性结合起来,达到一种相对比较理想的效果。 先使用masscan以较高速率对ip存活端口进行探测,再以多进程的方式,使用nmap对开放的端口进行服务探测。 安

starnightcyber 75 Dec 19, 2022
A Python program that will log into your scheduled Google Meets hands free

Chrome GMautomation General Information This Python program will open up Chrome and log into your scheduled Google Meet with camera and mic turned off

Jonathan Leow 5 Dec 31, 2021
Multi-asset backtesting framework. An intuitive API lets analysts try out their strategies right away

Multi-asset backtesting framework. An intuitive API lets analysts try out their strategies right away. Fast execution of profit-take/loss-cut orders is built-in. Seamless with Pandas.

Epymetheus 39 Jan 06, 2023
Load Testing ML Microservices for Robustness and Scalability

The demo is aimed at getting started with load testing a microservice before taking it to production. We use FastAPI microservice (to predict weather) and Locust to load test the service (locally or

Emmanuel Raj 13 Jul 05, 2022
Mixer -- Is a fixtures replacement. Supported Django, Flask, SqlAlchemy and custom python objects.

The Mixer is a helper to generate instances of Django or SQLAlchemy models. It's useful for testing and fixture replacement. Fast and convenient test-

Kirill Klenov 871 Dec 25, 2022
Tattoo - System for automating the Gentoo arch testing process

Naming origin Well, naming things is very hard. Thankfully we have an excellent

Arthur Zamarin 4 Nov 07, 2022
hCaptcha solver and bypasser for Python Selenium. Simple website to try to solve hCaptcha.

hCaptcha solver for Python Selenium. Many thanks to engageub for his hCaptcha solver userscript. This script is solely intended for the use of educati

Maxime Dréan 59 Dec 25, 2022
catsim - Computerized Adaptive Testing Simulator

catsim - Computerized Adaptive Testing Simulator Quick start catsim is a computerized adaptive testing simulator written in Python 3.4 (with modificat

Nguyễn Văn Anh Tuấn 1 Nov 29, 2021
Automated Penetration Testing Framework

Automated Penetration Testing Framework

OWASP 2.1k Jan 01, 2023
The pytest framework makes it easy to write small tests, yet scales to support complex functional testing

The pytest framework makes it easy to write small tests, yet scales to support complex functional testing for applications and libraries. An example o

pytest-dev 9.6k Jan 02, 2023
Um scraper feito em python que gera arquivos de excel baseados nas tier lists do site LoLalytics.

LoLalytics-scraper Um scraper feito em python que gera arquivos de excel baseados nas tier lists do site LoLalytics. Começando por um único script com

Kevin Souza 1 Feb 19, 2022
Green is a clean, colorful, fast python test runner.

Green -- A clean, colorful, fast python test runner. Features Clean - Low redundancy in output. Result statistics for each test is vertically aligned.

Nathan Stocks 756 Dec 22, 2022
Kent - Fake Sentry server for local development, debugging, and integration testing

Kent is a service for debugging and integration testing Sentry.

Will Kahn-Greene 100 Dec 15, 2022
pytest plugin for distributed testing and loop-on-failures testing modes.

xdist: pytest distributed testing plugin The pytest-xdist plugin extends pytest with some unique test execution modes: test run parallelization: if yo

pytest-dev 1.1k Dec 30, 2022
BDD library for the py.test runner

BDD library for the py.test runner pytest-bdd implements a subset of the Gherkin language to enable automating project requirements testing and to fac

pytest-dev 1.1k Jan 09, 2023
Python version of the Playwright testing and automation library.

🎭 Playwright for Python Docs | API Playwright is a Python library to automate Chromium, Firefox and WebKit browsers with a single API. Playwright del

Microsoft 7.8k Jan 02, 2023
Hamcrest matchers for Python

PyHamcrest Introduction PyHamcrest is a framework for writing matcher objects, allowing you to declaratively define "match" rules. There are a number

Hamcrest 684 Dec 29, 2022
Android automation project with pytest+appium

Android automation project with pytest+appium

1 Oct 28, 2021
AutoExploitSwagger is an automated API security testing exploit tool that can be combined with xray, BurpSuite and other scanners.

AutoExploitSwagger is an automated API security testing exploit tool that can be combined with xray, BurpSuite and other scanners.

6 Jan 28, 2022