Important dataframe statistics with a single command

Last update: Dec 19, 2021

Overview

quick_eda

Receiving dataframe statistics with one command

Project description

A python package for Data Scientists, Students, ML Engineers and anyone who wants dataframe meta data without the trouble of having to type in numerous commands.

Installation

Use pip to install quick-eda by typing or copying the following command.

pip install quick-eda

License

This package is licensed under BSD Clause 3.

Example usage

Users of the package can import the individual modules from this package, for example:

import quick_eda.df_eda
import quick_eda.column_eda

This loads the submodules quick_eda.df_eda and quick_eda.column_eda. They must be referenced with their full name.

quick_eda.df_eda.df_eda(<df>)
quick_eda.column_eda.column_eda(<column_name>)

An alternative way of importing the submodules is:

from quick_eda import df_eda
from quick_eda import column_eda

This also loads the submodules quick_eda.df_eda and quick_eda.column_eda, and makes them available without their prefix, so they can be used as follows:

df_eda.df_eda(<df>)
column_eda.column_eda(<column_name>)

Yet another variation is to import the desired functions directly:

from quick_eda.df_eda import df_eda
from quick_eda.column_eda import column_eda

Again, this loads the submodules, but makes them directly available:

df_eda(<df>)
column_eda(<column_name>)

Imagine you have a dataframe called pets with the columns name, age and color. You could then run statistics on both the entire dataframe or e.g. the column age with

df_eda(pets)
column_eda(pets, "age")

Source code & further information

The source code is maintained at https://github.com/sveneschlbeck/quick_eda
There are also further information concerning the BSD license model, contributing guidelines and more...

Important dataframe statistics with a single command

Related tags

Overview

quick_eda

Project description

Installation

License

Example usage

Source code & further information

Owner

Sven Eschlbeck

Python-based Space Physics Environment Data Analysis Software

Pipeline and Dataset helpers for complex algorithm evaluation.

Python Practicum - prepare for your Data Science interview or get a refresher.

Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Cleaning and analysing aggregated UK political polling data.

BIGDATA SIMULATION ONE PIECE WORLD CENSUS

collect training and calibration data for gaze tracking

Mining the Stack Overflow Developer Survey

API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

PyClustering is a Python, C++ data mining library.

An Indexer that works out-of-the-box when you have less than 100K stored Documents

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Convert monolithic Jupyter notebooks into Ploomber pipelines.

Leverage Twitter API v2 to analyze tweet metrics such as impressions and profile clicks over time.

A Python adaption of Augur to prioritize cell types in perturbation analysis.

t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology.

In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

A Python module for clustering creators of social media content into networks