Python function to query SQLite files stored on S3

Overview

sqlite-s3-query CircleCI Test Coverage

Python function to query a SQLite file stored on S3. It uses multiple HTTP range requests per query to avoid downloading the entire file, and so is suitable for large databases.

All the HTTP requests for a query request the same version of the database object in S3, so queries should complete succesfully even if the database is replaced concurrently by another S3 client. Versioning must be enabled on the S3 bucket.

Operations that write to the database are not supported.

Inspired by phiresky's sql.js-httpvfs, and dacort's Stack Overflow answer.

Installation

sqlite-s3-query depends on APSW, which is not available on PyPI, but can be installed directly from GitHub.

pip install sqlite_s3_query
pip install https://github.com/rogerbinns/apsw/releases/download/3.36.0-r1/apsw-3.36.0-r1.zip --global-option=fetch --global-option=--version --global-option=3.36.0 --global-option=--all --global-option=build --global-option=--enable-all-extensions

Usage

from sqlite_s3_query import sqlite_s3_query

results_iter = sqlite_s3_query(
    'SELECT * FROM my_table WHERE my_column = ?', params=('my-value',),
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
)

for row in results_iter:
    print(row)

If in your project you use multiple queries to the same file, functools.partial can be used to make an interface with less duplication.

from functools import partial
from sqlite_s3_query import sqlite_s3_query

query_my_db = partial(sqlite_s3_query,
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
)

for row in query_my_db('SELECT * FROM my_table WHERE my_col = ?', params=('my-value',)):
    print(row)

for row in query_my_db('SELECT * FROM my_table_2 WHERE my_col = ?', params=('my-value',)):
    print(row)

The AWS region and the credentials are taken from environment variables, but this can be changed using the get_credentials parameter. Below shows the default implementation of this that can be overriden.

import os
from functools import partial
from sqlite_s3_query import sqlite_s3_query

query_my_db = partial(sqlite_s3_query
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
    get_credentials=lambda: (
        os.environ['AWS_DEFAULT_REGION'],
        os.environ['AWS_ACCESS_KEY_ID'],
        os.environ['AWS_SECRET_ACCESS_KEY'],
        os.environ.get('AWS_SESSION_TOKEN'),  # Only needed for temporary credentials
    ),
)

for row in query_my_db('SELECT * FROM my_table_2 WHERE my_col = ?', params=('my-value',)):
    print(row)
Owner
Michal Charemza
Michal Charemza
TinyDB is a lightweight document oriented database optimized for your happiness :)

Quick Links Example Code Supported Python Versions Documentation Changelog Extensions Contributing Introduction TinyDB is a lightweight document orien

Markus Siemens 5.6k Dec 30, 2022
pickleDB is an open source key-value store using Python's json module.

pickleDB pickleDB is lightweight, fast, and simple database based on the json module. And it's BSD licensed! pickleDB is Fun import pickledb

Harrison Erd 738 Jan 04, 2023
A Python wrapper API for operating and working with the Neo4j Graph Data Science (GDS) library

gdsclient This repo hosts the sources for gdsclient, a Python wrapper API for operating and working with the Neo4j Graph Data Science (GDS) library. g

Neo Technology 101 Jan 05, 2023
This repo contains the backend of the KMK project

KMK Backend This repository contains the backend part of the KMK project Demo Watch it on Youtube Getting started Pre-commit hooks After you cloned th

21 Nov 26, 2022
Makes google's political ad database actually useful

Making Google's political ad transparency library suck less This is a series of scripts that takes Google's political ad transparency data and makes t

The Guardian 7 Apr 28, 2022
MyReplitDB - the most simplistic and easiest wrapper to use for replit's database system.

MyReplitDB is the most simplistic and easiest wrapper to use for replit's database system. Installing You can install it from the PyPI Or y

kayle 4 Jul 03, 2022
LaikaDB, banco de dados para projetos simples.

LaikaDB LaikaDB é um banco de dados noSQL para uso local e simples, onde você pode realizar gravações e leituras de forma eficiente e simples. Todos o

Jaedson Silva 0 Jun 24, 2022
This project is related to a No-SQL database, whose data are referred to autoctone botanic species

This project is related to a No-SQL database, whose data are referred to autoctone botanic species. The final goal is creating a function that performs the estimation of the ornamental value, given t

Amatofrancesco99 2 Mar 08, 2022
AWS Tags As A Database is a Python library using AWS Tags as a Key-Value database.

AWS Tags As A Database is a Python library using AWS Tags as a Key-Value database. This database is completely free* 💸

Oren Leung 42 Nov 25, 2022
A fast ordered NoSQL database.

MerkavaDB Note This is still in active development. Things will change. If you are interested in helping out, or would like to see any particular feat

Adam Hopkins 6 Sep 29, 2022
Python object-oriented database

ZODB, a Python object-oriented database ZODB provides an object-oriented database for Python that provides a high-degree of transparency. ZODB runs on

Zope 574 Dec 31, 2022
ChaozzDBPy - A python implementation based on the original ChaozzDB from Chaozznl with some new features

ChaozzDBPy About ChaozzDBPy is a python implementation based on the original Cha

Igor Iglesias 1 May 25, 2022
PathfinderMonsterDatabase - A database of all monsters in Pathfinder 1e, created by parsing aonprd.com

PathfinderMonsterDatabase A database of all monsters in Pathfinder 1e, created by parsing aonprd.com Setup Run the following line to install all requi

Yoni Lerner 11 Jun 12, 2022
Turn SELECT queries returned by a query into links to execute them

datasette-query-links Turn SELECT queries returned by a query into links to execute them Installation Install this plugin in the same environment as D

Simon Willison 5 Apr 27, 2022
LightDB is a lightweight JSON Database for Python

LightDB What is this? LightDB is a lightweight JSON Database for Python that allows you to quickly and easily write data to a file Installing pip3 ins

Stanislaw 14 Oct 01, 2022
Shelf DB is a tiny document database for Python to stores documents or JSON-like data

Shelf DB Introduction Shelf DB is a tiny document database for Python to stores documents or JSON-like data. Get it $ pip install shelfdb shelfquery S

Um Nontasuwan 35 Nov 03, 2022
Tiny local JSON database for Python.

Pylowdb Simple to use local JSON database 🦉 # This is pure python, not specific to pylowdb ;) db.data['posts'] = ({ 'id': 1, 'title': 'pylowdb is awe

Hussein Sarea 3 Jan 26, 2022
Codeqlcompile - 自动反编译闭源应用,创建codeql数据库

codeql_compile 自动反编译闭源应用,创建codeql数据库 准备 首先下载ecj.jar和idea提供反编译的java-decompiler.ja

236 Jan 05, 2023
Mongita is to MongoDB as SQLite is to SQL

Mongita is a lightweight embedded document database that implements a commonly-used subset of the MongoDB/PyMongo interface. Mongita differs from MongoDB in that instead of being a server, Mongita is

Scott Rogowski 809 Jan 07, 2023
Manage your sqlite database very easy (like django) ...

Manage your sqlite database very easy (like django) ...

aWolver 1 Feb 09, 2022