My own Unicode compression algorithm

Overview

Zee Code

ZCode is a custom compression algorithm I originally developed for a competition held for the Spring 2019 Datastructures and Algorithms course of Dr. Mahdi Safarnejad-Boroujeni at Sharif University of Technology, at which I became first-place. The code is pretty slow and has a lot of room for optimization, but it is pretty readable. It can be an excellent educational resource for whoever is starting on compression algorithms.

The algorithm is a cocktail of classical compression algorithms mixed and served for Unicode documents. It hinges around the LZW algorithm to create a finite size symbol dictionary; the results are then byte-coded into variable-length custom symbols, which I call zee codes! Finally, the symbol table is truncated accordingly, and the compressed document is encoded into a byte stream.

Huffman trees highly inspire zee codes, but because in normal texts, symbols are usually much more uniformly distributed than the original geometrical (or exponential) distribution assumption for effective Huffman coding, the gains of using variable-sized byte-codes both from an implementation and performance perspective outweighed bit Huffman encodings. Results may vary, but my tests showed a steady ~4-5x compression ratio on Farsi texts, which is pretty nice!

Installation

ZCode is available on pip, and only requires a 3.6 or higher python installation beforehand.

pip install -U zcode

Usage

You can run the algorithm for any utf-8 encoded file using the zcode command. It will automatically decompress files ending with a .zee extensions and compress others into .zee files, but you can always override the default behavior by providing optional arguments like:

zcode INPUTFILE [--output OUTPUT_FILE --action compress/decompress --symbol-size SYMBOL_SIZE --code-size CODE_SIZE]

The symbol-size argument controls the algorithms' buffer size for processing symbols (in bytes). It is automatically set depending on your input file size but you can change it as you wish. code-size controls the maximum length for coded bytes while encoding symbols (this equals to 2 by default and needs to be provided to the algorithm upon decompression).

LICENSE

MIT LICENSE, see vahidzee/zcode/LICENSE

You might also like...
Search algorithm implementations meant for teaching

Search-py A collection of search algorithms for teaching and experimenting. Non-adversarial Search There’s a heavy separation of concerns which leads

8-puzzle-solver with UCS, ILS, IDA* algorithm
8-puzzle-solver with UCS, ILS, IDA* algorithm

Eight Puzzle 8-puzzle-solver with UCS, ILS, IDA* algorithm pre-usage requirements python3 python3-pip virtualenv prepare enviroment virtualenv -p pyth

A Python Package for Portfolio Optimization using the Critical Line Algorithm

A Python Package for Portfolio Optimization using the Critical Line Algorithm

🧬 Training the car to do self-parking using a genetic algorithm
🧬 Training the car to do self-parking using a genetic algorithm

🧬 Training the car to do self-parking using a genetic algorithm

A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format.

TSP-Nearest-Insertion A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format. Instructions Load a txt file wi

A GUI visualization of QuickSort algorithm
A GUI visualization of QuickSort algorithm

QQuickSort A simple GUI visualization of QuickSort algorithm. It only uses PySide6, it does not have any other external dependency. How to run Install

A python implementation of the Basic Photometric Stereo Algorithm
A python implementation of the Basic Photometric Stereo Algorithm

Photometric-Stereo A python implementation of the Basic Photometric Stereo Algorithm Result Usage run Photometric_Stereo.py Code Tree |data #原始数据,tga格

Python package to monitor the power consumption of any algorithm

CarbonAI This project aims at creating a python package that allows you to monitor the power consumption of any python function. Documentation The com

marching rectangles algorithm in python with clean code.

Marching Rectangles marching rectangles algorithm in python with clean code. Tools Python 3 EasyDraw Creators Mohammad Dori Run the Code Installation

Releases(v0.0.1)
Owner
Vahid Zehtab
Applied & Theoretical Deep Learning Researcher currently studying my last semester of Compute Engineering Bachelors program at Sharif University of Technology
Vahid Zehtab
A tictactoe where you never win, implemented using minimax algorithm

Unbeatable_TicTacToe A tictactoe where you never win, implemented using minimax algorithm Requirements Make sure you have the pygame module along with

Jessica Jolly 3 Jul 28, 2022
This project consists of a collaborative filtering algorithm to predict movie reviews ratings from a dataset of Netflix ratings.

Collaborative Filtering - Netflix movie reviews Description This project consists of a collaborative filtering algorithm to predict movie reviews rati

Shashank Kumar 1 Dec 21, 2021
Classic algorithms including Fizz Buzz, Bubble Sort, the Fibonacci Sequence, a Sudoku solver, and more.

Algorithms Classic algorithms including Fizz Buzz, Bubble Sort, the Fibonacci Sequence, a Sudoku solver, and more. Algorithm Complexity Time and Space

1 Jan 14, 2022
Solving a card game with three search algorithms: BFS, IDS, and A*

Search Algorithms Overview In this project, we want to solve a card game with three search algorithms. In this card game, we have to sort our cards by

Korosh 5 Aug 04, 2022
This is a Python implementation of the HMRF algorithm on networks with categorial variables.

Salad Salad is an Open Source Python library to segment tissues into different biologically relevant regions based on Hidden Markov Random Fields. The

1 Nov 16, 2021
A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines

py-earth A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines algorithm, in the style of scikit-learn. The py-earth p

431 Dec 15, 2022
This is a demo for AAD algorithm.

Asynchronous-Anisotropic-Diffusion-Algorithm This is a demo for AAD algorithm. The subroutine of the anisotropic diffusion algorithm is modified from

3 Mar 21, 2022
So far implements A* will add more later

Pathfinding_Visualization Finds the shortest path between two nodes. The light blue path is the shortest path. The black nodes are barriers. Created i

Lukas DeLoach 1 Jan 18, 2022
🧬 Performant Evolutionary Algorithms For Python with Ray support

🧬 Performant Evolutionary Algorithms For Python with Ray support

Nathan 49 Oct 20, 2022
Implementation of an ordered dithering algorithm used in computer graphics

Ordered Dithering Project In this project, we use an ordered dithering method to turn an RGB image, first to a gray scale image and then, turn the gra

1 Oct 26, 2021
This is an implementation of the QuickHull algorithm in Python. I

QuickHull This is an implementation of the QuickHull algorithm in Python. It randomly generates a set of points and finds the convex hull of this set

Anant Joshi 4 Dec 04, 2022
Path tracing obj - (taichi course final project) a path tracing renderer that can import and render obj files

Path tracing obj - (taichi course final project) a path tracing renderer that can import and render obj files

5 Sep 10, 2022
A calculator to test numbers against the collatz conjecture

The Collatz Calculator This is an algorithm custom built by Kyle Dickey, used to test numbers against the simple rules of the Collatz Conjecture. Get

Kyle Dickey 2 Jun 14, 2022
A Python program to easily solve the n-queens problem using min-conflicts algorithm

QueensProblem A program to easily solve the n-queens problem using min-conflicts algorithm Performances estimated with a sample of 1000 different rand

0 Oct 21, 2022
Distributed Grid Descent: an algorithm for hyperparameter tuning guided by Bayesian inference, designed to run on multiple processes and potentially many machines with no central point of control

Distributed Grid Descent: an algorithm for hyperparameter tuning guided by Bayesian inference, designed to run on multiple processes and potentially many machines with no central point of control.

Martin 1 Jan 01, 2022
The test data, code and detailed description of the AW t-SNE algorithm

AW-t-SNE The test data, code and result of the AW t-SNE algorithm Structure of the folder Datasets: This folder contains two datasets, the MNIST datas

1 Mar 09, 2022
Using A * search algorithm and GBFS search algorithm to solve the Romanian problem

Romanian-problem-using-Astar-and-GBFS Using A * search algorithm and GBFS search algorithm to solve the Romanian problem Romanian problem: The agent i

Mahdi Hassanzadeh 6 Nov 22, 2022
A lightweight, object-oriented finite state machine implementation in Python with many extensions

transitions A lightweight, object-oriented state machine implementation in Python with many extensions. Compatible with Python 2.7+ and 3.0+. Installa

4.7k Jan 01, 2023
A fast, pure python implementation of the MuyGPs Gaussian process realization and training algorithm.

Fast implementation of the MuyGPs Gaussian process hyperparameter estimation algorithm MuyGPs is a GP estimation method that affords fast hyperparamet

Lawrence Livermore National Laboratory 13 Dec 02, 2022
A GUI visualization of QuickSort algorithm

QQuickSort A simple GUI visualization of QuickSort algorithm. It only uses PySide6, it does not have any other external dependency. How to run Install

Jaime R. 2 Dec 24, 2021