Query multiple mongoDB database collections easily

Overview

leakscoop

Perform queries across multiple MongoDB databases and collections, where the field names and the field content structure in each database may vary.

The Problem

Suppose you've got two database collections, "leak1" and "leak2"

In leak1, the schema looks like this:

FIRST_NAME: "JOHN"
LAST_NAME: "DOE"

and in leak2, the schema looks like this:

FName: "John"
LName: "Doe"

A simple program to iterate through all your collections and perform queries wouldn't work, because:

  • the field names are different. Notice that in leak1, the first name field is FIRST_NAME, while in leak2, the first name field is named FName.
  • the field values might be structured differently. In leak1, everything is captialized. In leak2, it's all title-case.

This program lets you write a configuration for each collection, specifying, in JSON, how to query each field.

It's a work in progress, but so far, it works pretty well. It'll probably be easier to understand if you take a look at the config files under ./collections/. Each JSON file under ./collections/ should be an array of objects. The program automatically processes all JSON files under that directory.

Some more info for how the configurations work can be found in the wiki.

Example Usage:

Find all records of a guy named John Doe.

python3 -m dev --firstname John --lastname Doe

Each database will be searched, and results will be put into a new file under ./results/

Find all records for someone with an address of "1234 NW Long St" python3 -m dev --address "1234 NW long st"

Adding a zipcode to the end, or a state/province might speed up the query (depending on how you index your databases)

Screenshot

(information redacted for this person's privacy)

image Configured fields will print to the console, while all the other fields in a result will be saved under ./results/.

Owner
bagel
bagel
A fast unobtrusive MongoDB ODM for Python.

MongoFrames MongoFrames is a fast unobtrusive MongoDB ODM for Python designed to fit into a workflow not dictate one. Documentation is available at Mo

getme 45 Jun 01, 2022
aiomysql is a library for accessing a MySQL database from the asyncio

aiomysql aiomysql is a "driver" for accessing a MySQL database from the asyncio (PEP-3156/tulip) framework. It depends on and reuses most parts of PyM

aio-libs 1.5k Jan 03, 2023
An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets

datasets_sql A 🤗 Datasets extension package that provides support for executing arbitrary SQL queries on HF datasets. It uses DuckDB as a SQL engine

Mario Šaško 19 Dec 15, 2022
Micro ODM for MongoDB

Beanie - is an asynchronous ODM for MongoDB, based on Motor and Pydantic. It uses an abstraction over Pydantic models and Motor collections to work wi

Roman 993 Jan 03, 2023
Redis client for Python asyncio (PEP 3156)

Redis client for Python asyncio. Redis client for the PEP 3156 Python event loop. This Redis library is a completely asynchronous, non-blocking client

Jonathan Slenders 554 Dec 04, 2022
An asyncio compatible Redis driver, written purely in Python. This is really just a pet-project for me.

asyncredis An asyncio compatible Redis driver. Just a pet-project. Information asyncredis is, like I've said above, just a pet-project for me. I reall

Vish M 1 Dec 25, 2021
A simple wrapper to make a flat file drop in raplacement for mongodb out of TinyDB

Purpose A simple wrapper to make a drop in replacement for mongodb out of tinydb. This module is an attempt to add an interface familiar to those curr

180 Jan 01, 2023
ClickHouse Python Driver with native interface support

ClickHouse Python Driver ClickHouse Python Driver with native (TCP) interface support. Asynchronous wrapper is available here: https://github.com/myma

Marilyn System 957 Dec 30, 2022
MinIO Client SDK for Python

MinIO Python SDK for Amazon S3 Compatible Cloud Storage MinIO Python SDK is Simple Storage Service (aka S3) client to perform bucket and object operat

High Performance, Kubernetes Native Object Storage 582 Dec 28, 2022
A pythonic interface to Amazon's DynamoDB

PynamoDB A Pythonic interface for Amazon's DynamoDB. DynamoDB is a great NoSQL service provided by Amazon, but the API is verbose. PynamoDB presents y

2.1k Dec 30, 2022
SAP HANA Connector in pure Python

SAP HANA Database Client for Python Important Notice This public repository is read-only and no longer maintained. The active maintained alternative i

SAP Archive 299 Nov 20, 2022
Find graph motifs using intuitive notation

d o t m o t i f Find graph motifs using intuitive notation DotMotif is a library that identifies subgraphs or motifs in a large graph. It looks like t

APL BRAIN 45 Jan 02, 2023
Python client for Apache Kafka

Kafka Python client Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the offici

Dana Powers 5.1k Jan 08, 2023
Create a database, insert data and easily select it with Sqlite

sqliteBasics create a database, insert data and easily select it with Sqlite Watch on YouTube a step by step tutorial explaining this code: https://yo

Mariya 27 Dec 27, 2022
Records is a very simple, but powerful, library for making raw SQL queries to most relational databases.

Records: SQL for Humansâ„¢ Records is a very simple, but powerful, library for making raw SQL queries to most relational databases. Just write SQL. No b

Kenneth Reitz 6.9k Jan 03, 2023
A Python wheel containing PostgreSQL

postgresql-wheel A Python wheel for Linux containing a complete, self-contained, locally installable PostgreSQL database server. All servers run as th

Michel Pelletier 71 Nov 09, 2022
Making it easy to query APIs via SQL

Shillelagh Shillelagh (ʃɪˈleɪlɪ) is an implementation of the Python DB API 2.0 based on SQLite (using the APSW library): from shillelagh.backends.apsw

Beto Dealmeida 207 Dec 30, 2022
This is a repository for a task assigned to me by Bilateral solutions!

Processing-Files-using-MySQL This is a repository for a task assigned to me by Bilateral solutions! Task: Make Folders named Processing,queue and proc

Kandal Khandeka 1 Nov 07, 2022
SAP HANA Connector in pure Python

SAP HANA Database Client for Python A pure Python client for the SAP HANA Database based on the SAP HANA Database SQL Command Network Protocol. pyhdb

SAP 299 Nov 20, 2022
Logica is a logic programming language that compiles to StandardSQL and runs on Google BigQuery.

Logica: language of Big Data Logica is an open source declarative logic programming language for data manipulation. Logica is a successor to Yedalog,

Evgeny Skvortsov 1.5k Dec 30, 2022