Simple spill-to-disk dictionary

Related tags

Data Structureschest
Overview

Chest

Build Status Coverage Status Version Status Downloads

A dictionary that spills to disk.

Chest acts likes a dictionary but it can write its contents to disk. This is useful in the following two occasions:

  1. Chest can hold datasets that are larger than memory
  2. Chest persists and so can be saved and loaded for later use

Related Projects

The standard library shelve is an alternative out-of-core dictionary. Chest offers the following benefits over shelve:

  1. Chest supports any hashable key (not just strings)
  2. Chest supports pluggable serialization and file saving schemes

Alternatively one might consider a traditional key-value store database like Redis.

Shove is another excellent alternative with support for a variety of stores.

How it works

Chest stores data in two locations

  1. An in-memory dictionary
  2. On the filesystem in a directory owned by the chest

As a user adds contents to the chest the in-memory dictionary fills up. When a chest stores more data in memory than desired (see available_memory= keyword argument) it writes the larger contents of the chest to disk as pickle files (the choice of pickle is configurable). When a user asks for a value chest checks the in-memory store, then checks on-disk and loads the value into memory if necessary, pushing other values to disk.

Chest is a simple project. It was intended to provide a simple interface to assist in the storage and retrieval of numpy arrays. However it's design and implementation are agnostic to this case and so could be used in a variety of other situations.

With minimal work chest could be extended to serve as a communication point between multiple processes.

Known Failings

Chest was designed to hold a moderate amount of largish numpy arrays. It doesn't handle the very many small key-value pairs usecase (though could with small effort). In particular chest has the following deficiencies

  1. Chest is not multi-process safe. We should institute a file lock at least around the .keys file.
  2. Chest does not support mutation of variables on disk.

LICENSE

New BSD. See License

Install

chest is available through conda:

conda install chest

chest is on the Python Package Index (PyPI):

pip install chest

Example

>>> from chest import Chest
>>> c = Chest()

>>> # Acts like a normal dictionary
>>> c['x'] = [1, 2, 3]
>>> c['x']
[1, 2, 3]

>>> # Data persists to local files
>>> c.flush()
>>> import os
>>> os.listdir(c.path)
['.keys', 'x']

>>> # These files hold pickled results
>>> import pickle
>>> pickle.load(open(c.key_to_filename('x')))
[1, 2, 3]

>>> # Though one normally accesses these files with chest itself
>>> c2 = Chest(path=c.path)
>>> c2.keys()
['x']
>>> c2['x']
[1, 2, 3]

>>> # Chest is configurable, so one can use json, etc. instead of pickle
>>> import json
>>> c = Chest(path='my-chest', dump=json.dump, load=json.load)
>>> c['x'] = [1, 2, 3]
>>> c.flush()

>>> json.load(open(c.key_to_filename('x')))
[1, 2, 3]

Dependencies

Chest supports Python 2.6+ and Python 3.2+ with a common codebase.

It currently depends on the heapdict library.

It is a light weight dependency.

Owner
Blaze
Blaze
CLASSIX is a fast and explainable clustering algorithm based on sorting

CLASSIX Fast and explainable clustering based on sorting CLASSIX is a fast and explainable clustering algorithm based on sorting. Here are a few highl

69 Jan 06, 2023
Solutions for leetcode problems.

Leetcode-solution This is an repository for storring new algorithms that I am learning form the LeetCode for future use. Implemetations Two Sum (pytho

Shrutika Borkute 1 Jan 09, 2022
A simple tutorial to use tree-sitter to parse code into ASTs

A simple tutorial to use py-tree-sitter to parse code into ASTs. To understand what is tree-sitter, see https://github.com/tree-sitter/tree-sitter. Tr

Nghi D. Q. Bui 7 Sep 17, 2022
Svector (pronounced Swag-tor) provides extension methods to pyrsistent data structures

Svector Svector (pronounced Swag-tor) provides extension methods to pyrsistent data structures. Easily chain your methods confidently with tons of add

James Chua 5 Dec 09, 2022
This repo represents all we learned and are learning in Data Structure course.

DataStructure Journey This repo represents all we learned and are learning in Data Structure course which is based on CLRS book and is being taught by

Aprime Afr (Alireza Afroozi) 3 Jan 22, 2022
One-Stop Destination for codes of all Data Structures & Algorithms

CodingSimplified_GK This repository is aimed at creating a One stop Destination of codes of all Data structures and Algorithms along with basic explai

Geetika Kaushik 21 Sep 26, 2022
A DSA repository but everything is in python.

DSA Status Contents A: Mathematics B: Bit Magic C: Recursion D: Arrays E: Searching F: Sorting G: Matrix H: Hashing I: String J: Linked List K: Stack

Shubhashish Dixit 63 Dec 23, 2022
This repository is a compilation of important Data Structures and Algorithms based on Python.

Python DSA 🐍 This repository is a compilation of important Data Structures and Algorithms based on Python. Please make seperate folders for different

Bhavya Verma 27 Oct 29, 2022
An esoteric data type built entirely of NaNs.

NaNsAreNumbers An esoteric data type built entirely of NaNs. Installation pip install nans_are_numbers Explanation A floating point number is just co

Travis Hoppe 72 Jan 01, 2023
A high-performance immutable mapping type for Python.

immutables An immutable mapping type for Python. The underlying datastructure is a Hash Array Mapped Trie (HAMT) used in Clojure, Scala, Haskell, and

magicstack 996 Jan 02, 2023
Basic sort and search algorithms written in python.

Basic sort and search algorithms written in python. These were all developed as part of my Computer Science course to demonstrate understanding so they aren't 100% efficent

Ben Jones 0 Dec 14, 2022
Chemical Structure Generator

CSG: Chemical Structure Generator A simple Chemical Structure Generator. Requirements Python 3 (= v3.8) PyQt5 (optional; = v5.15.0 required for grap

JP&K 5 Oct 22, 2022
Google, Facebook, Amazon, Microsoft, Netflix tech interview questions

Algorithm and Data Structures Interview Questions HackerRank | Practice, Tutorials & Interview Preparation Solutions This repository consists of solut

Quan Le 8 Oct 04, 2022
Supporting information (calculation outputs, structures)

Supporting information (calculation outputs, structures)

Eric Berquist 2 Feb 02, 2022
Map single-cell transcriptomes to copy number evolutionary trees.

Map single-cell transcriptomes to copy number evolutionary trees. Check out the tutorial for more information. Installation $ pip install scatrex SCA

Computational Biology Group (CBG) 12 Jan 01, 2023
Integrating C Buffer Data Into the instruction of `.text` segment instead of on `.data`, `.rodata` to avoid copy.

gcc-bufdata-integrating2text Integrating C Buffer Data Into the instruction of .text segment instead of on .data, .rodata to avoid copy. Usage In your

Jack Ren 1 Jan 31, 2022
nocasedict - A case-insensitive ordered dictionary for Python

nocasedict - A case-insensitive ordered dictionary for Python Overview Class NocaseDict is a case-insensitive ordered dictionary that preserves the or

PyWBEM Projects 2 Dec 12, 2021
IADS 2021-22 Algorithm and Data structure collection

A collection of algorithms and datastructures introduced during UoE's Introduction to Datastructures and Algorithms class.

Artemis Livingstone 20 Nov 07, 2022
Decided to include my solutions for leetcode problems.

LeetCode_Solutions Decided to include my solutions for leetcode problems. LeetCode # 1 TwoSum First leetcode problem and it was kind of a struggle. Th

DandaIT04 0 Jan 01, 2022
This repo is all about different data structures and algorithms..

Data Structure and Algorithm : Want to learn data strutrues and algorithms ??? Then Stop thinking more and start to learn today. This repo will help y

Priyanka Kothari 7 Jul 10, 2022