Intake is a lightweight package for finding, investigating, loading and disseminating data.

Last update: Jan 01, 2023

Overview

Intake: A general interface for loading data

Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake helps you:

Load data from a variety of formats (see the current list of known plugins) into containers you already know, like Pandas dataframes, Python lists, NumPy arrays, and more.
Convert boilerplate data loading code into reusable Intake plugins
Describe data sets in catalog files for easy reuse and sharing between projects and with others.
Share catalog information (and data sets) over the network with the Intake server

Documentation is available at Read the Docs.

Status of intake and related packages is available at Status Dashboard

Weekly news about this repo and other related projects can be found on the wiki

Install

Recommended method using conda:

conda install -c conda-forge intake

You can also install using pip, in which case you have a choice as to how many of the optional dependencies you install, with the simplest having least requirements

pip install intake

and additional sections [server], [plot] and [dataframe], or to include everything:

pip install intake[complete]

Note that you may well need specific drivers and other plugins, which usually have additional dependencies of their own.

Development

Create development Python environment with the required dependencies, ideally with conda. The requirements can be found in the yml files in the scripts/ci/ directory of this repo.
- e.g. conda env create -f scripts/ci/environment-py38.yml and then conda activate test_env
Install intake using pip install -e .[complete]
Use pytest to run tests.
Create a fork on github to be able to submit PRs.
We respect, but do not enforce, pep8 standards; all new code should be covered by tests.

Intake is a lightweight package for finding, investigating, loading and disseminating data.

Related tags

Overview

Intake: A general interface for loading data

Install

Development

Owner

Intake

Performance analysis of predictive (alpha) stock factors

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Titanic data analysis for python

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

A neural-based binary analysis tool

Analyzing Covid-19 Outbreaks in Ontario

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

A data parser for the internal syncing data format used by Fog of World.

Exploring the Top ML and DL GitHub Repositories

Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data

NFCDS Workshop Beginners Guide Bioinformatics Data Analysis

📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

A notebook to analyze Amazon Recommendation Review Dataset.

Gaussian processes in TensorFlow

BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings.

The Spark Challenge Student Check-In/Out Tracking Script

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Predictive Modeling & Analytics on Home Equity Line of Credit