Simple integer-valued time series bit packing

Overview

Smahat Time Series Encoding

Smahat allows to encode a sequence of integer values using a fixed (for all values) number of bits but minimal with regards to the data range. For example: for a series of boolean values only one bit is needed, for a series of integer percentages 7 bits are needed, etc.

Smahat is useful when:

  • Time series is integer-valued. (It doesn't work with floats :))
  • The range of the data is known in advance (if not streaming, this is not necessary).
  • The data range is relatively small.
  • The data does not have properties that would make other compression algorithms useful, or these other algorithms have an unacceptable cost for the use case.

Smahat can also be used as a baseline to calculate the true compression ratio of a compression algorithm on data of a certain nature.

Installation

To install the latest release:

$ pip install smahat

You can also build a local package and install it:

$ make build
$ pip install dist/*.whl

Usage

Import smahat module.

>>> import smahat

Data to encode.

>>> values = [12, 0, 17, 15, 78, 10]

Encoding

You can use encode_next to encode one value by one:

>>> encoder = smahat.Encoder(min_value=0, max_value=100, strategy='saturate')
>>> for v in values:
...     encoder.encode_next(v)
>>> content = encoder.get_encoded()
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Or you can use Encoder.encode_all to encode all values (range min and max will be inferred from values if not provided):

>>> content = smahat.Encoder.encode_all(values, min_value=0, max_value=100, strategy='saturate')
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Decoding

To decode use Decoder.decode_all.

>>> smahat.Decoder.decode_all(content)
[12, 0, 17, 15, 78, 10]

Encoding result

The result of the encoding of a sequence of values using Smahat is a SmahatContent dictionary containing the encoded data, plus three fields : shift is used to bring the data range to start from zero (values are shifted and encoded in pure binary), n_bits_per_value indicates the number of bits used to encode each value, n_padding_bits (between 0 and 7) indicates the number of unused padding bits within the last byte.

class SmahatContent(TypedDict):
    encoded: bytes
    shift: int
    n_bits_per_value: int
    n_padding_bits: int

If you want to use this library for message exchanges, you can serialize the result of the encoding as you like (JSON, protobuf, etc.)

Contribute

$ git clone https://github.com/ghilesmeddour/smahat-time-series-encoding.git
$ cd smahat-time-series-compression
make format
make dead-code-check
make test
make type-check
make coverage
make build

TODOs

  • Add unit tests.
  • Improve doc.
You might also like...
Simple yet flexible natural sorting in Python.

natsort Simple yet flexible natural sorting in Python. Source Code: https://github.com/SethMMorton/natsort Downloads: https://pypi.org/project/natsort

A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!
A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!

Ubelt is a small library of robust, tested, documented, and simple functions that extend the Python standard library. It has a flat API that all behav

Simple python module to get the information regarding battery in python.
Simple python module to get the information regarding battery in python.

Battery Stats A python3 module created for easily reading the current parameters of Battery in realtime. It reads battery stats from /sys/class/power_

A simple Python app that generates semi-random chord progressions.

chords-generator A simple Python app that generates semi-random chord progressions.

A simple and easy to use Spam Bot made in Python!

This is a simple spam bot made in python. You can use to to spam anyone with anything on any platform.

Runes - Simple Cookies You Can Extend (similar to Macaroons)

Runes - Simple Cookies You Can Extend (similar to Macaroons) is a paper called "Macaroons: Cookies with Context

Simple collection of GTPS Flood in Python.

GTPS Flood Simple collection of GTPS Flood in Python. NOTE Give me credit if you use this source, don't trade/sell this tool, And USE AT YOUR OWN RISK

Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method.

Astvuln Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method. Some search methods ar

A set of Python scripts to surpass human limits in accomplishing simple tasks.

Human benchmark fooler Summary A set of Python scripts with Selenium designed to surpass human limits in accomplishing simple tasks available on https

Releases(v0.0.1)
Owner
Ghiles Meddour
Data Analyst at Munic
Ghiles Meddour
Standard implementations of FedLab and its provided benchmarks.

FedLab-benchmarks This repo contains standard implementations of FedLab and its provided benchmarks. Currently, following algorithms or benchrmarks ar

SMILELab-FL 104 Dec 05, 2022
produces PCA on genotypes from fasta files (popPhyl's ID format)

popPhyl_PCA Performs PCA of genotypes. Works in two steps. 1. Input file A single fasta file containing different loci, in different populations/speci

camille roux 2 Oct 08, 2021
Implicit hierarchical a posteriori error estimates in FEniCSx

FEniCSx Error Estimation (FEniCSx-EE) Description FEniCSx-EE is an open source library showing how various error estimation strategies can be implemen

Jack S. Hale 1 Dec 08, 2021
A sys-botbase client for remote control automation of Nintendo Switch consoles. Based on SysBot.NET, written in python.

SysBot.py A sys-botbase client for remote control automation of Nintendo Switch consoles. Based on SysBot.NET, written in python. Setup: Download the

7 Dec 16, 2022
Casefy (/keɪsfaɪ/) is a lightweight Python package to convert the casing of strings

Casefy (/keɪsfaɪ/) is a lightweight Python package to convert the casing of strings. It has no third-party dependencies and supports Unicode.

Diego Miguel Lozano 12 Jan 08, 2023
A simple tool to extract python code from a Jupyter notebook, and then run pylint on it for static analysis.

Jupyter Pylinter A simple tool to extract python code from a Jupyter notebook, and then run pylint on it for static analysis. If you find this tool us

Edmund Goodman 10 Oct 13, 2022
Regression Metrics Calculation Made easy

Regression Metrics Mean Absolute Error Mean Square Error Root Mean Square Error Root Mean Square Logarithmic Error Root Mean Square Logarithmic Error

Ashish Patel 12 Jan 02, 2023
Animation retargeting tool for Autodesk Maya. Retargets mocap to a custom rig with a few clicks.

Animation Retargeting Tool for Maya A tool for transferring animation data and mocap from a skeleton to a custom rig in Autodesk Maya. Installation: A

Joaen 63 Jan 06, 2023
Simple tool for creating changelogs

Description Simple utility for quickly generating changelogs, assuming your commits are ordered as they should be. This tool will simply log all lates

2 Jan 05, 2022
一款不需要买代理来减少扫网站目录被封概率的扫描器,适用于中小规格字典。

PoorScanner使用说明书 -工具在不同环境下可能不怎么稳定,如果有什么问题恳请大家反馈。说明书有什么错误的地方也大家欢迎指正。 更新记录 2021.8.23 修复了云函数主程序 gitee上传文件接口写错了的BUG(之前把自己的上传地址写死进去了,没从配置文件里读) 更新了说明书 PoorS

14 Aug 02, 2022
A small python library that helps you to generate localization strings for your mobile projects.

LocalizationUtiltiy A small python library that helps you to generate localization strings for your mobile projects. This small script aims to help yo

1 Nov 12, 2021
PyResToolbox - A collection of Reservoir Engineering Utilities

pyrestoolbox A collection of Reservoir Engineering Utilities This set of functio

Mark W. Burgoyne 39 Oct 17, 2022
An URL checking python module

An URL checking python module

Fayas Noushad 6 Aug 10, 2022
A simple example for calling C++ functions in Python by `ctypes`.

ctypes-example A simple example for calling C++ functions in Python by ctypes. Features call C++ function int bar(int* value, char* msg) with argumene

Yusu Pan 3 Nov 23, 2022
jsoooooooon derulo - Make sure your 'jason derulo' is featured as the first part of your json data

jsonderulo Make sure your 'jason derulo' is featured as the first part of your json data Install: # python pip install jsonderulo poetry add jsonderul

jesse 3 Sep 13, 2021
python-codicefiscale: a tiny library for encode/decode Italian fiscal code - codifica/decodifica del Codice Fiscale.

python-codicefiscale python-codicefiscale is a tiny library for encode/decode Italian fiscal code - codifica/decodifica del Codice Fiscale. Features T

Fabio Caccamo 53 Dec 14, 2022
Creates a C array from a hex-string or a stream of binary data.

hex2array-c Creates a C array from a hex-string. Usage Usage: python3 hex2array_c.py HEX_STRING [-h|--help] Use '-' to read the hex string from STDIN.

John Doe 3 Nov 24, 2022
Create password - Generate Random Password with Passphrase

Generate Random Password with Passphrase This is a python code to generate stron

1 Jan 18, 2022
Here, I find the Fibonacci Series using python

Fibonacci-Series-using-python Here, I find the Fibonacci Series using python Requirements No Special Requirements Contribution I have strong belief on

Sachin Vinayak Dabhade 4 Sep 24, 2021
kawadi is a versatile tool that used as a form of weapon and is used to cut, shape and split wood.

kawadi kawadi (કવાડિ in Gujarati) (Axe in English) is a versatile tool that used as a form of weapon and is used to cut, shape and split wood. kawadi

Jay Vala 2 Jan 10, 2022