Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Related tags

Data Analysisopendata
Overview

opendata

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

import asyncio
from opendata.sources.bikeshare.bay_wheels import trips as bay_wheels

trips_df, _ = asyncio.run(bay_wheels.async_load(trip_sample_rate=1000))

len(trips_df.index)
# 8731

trips_df.columns
# Index(['started_at', 'ended_at', 'start_station_id', 'end_station_id',
#        'start_station_name', 'end_station_name', 'rideable_type', 'ride_id',
#        'start_lat', 'start_lng', 'end_lat', 'end_lng', 'gender', 'user_type',
#        'bike_id', 'birth_year'],
#       dtype='object')

An example analysis can be found here: https://observablehq.com/@brady/bikeshare

Supports sampling and local file caching to improve performance.

Markets supported

import opendata.sources.bikeshare.bay_wheels
import opendata.sources.bikeshare.bixi
import opendata.sources.bikeshare.divvy
import opendata.sources.bikeshare.capital_bikeshare
import opendata.sources.bikeshare.citi_bike
import opendata.sources.bikeshare.cogo
import opendata.sources.bikeshare.niceride
import opendata.sources.bikeshare.bluebikes
import opendata.sources.bikeshare.metro_bike_share
import opendata.sources.bikeshare.indego

Bootstrap

Set up your environment

brew install chromedriver
brew install python3
python3 -m pip install pre-commit
pre-commit install --install-hooks
python3 -m venv venv
source venv/bin/activate
python3 -m pip install -r requirements.txt

Entering virtualenv

python3 -m venv venv
source venv/bin/activate
python3 -m pip install -r requirements.txt

Usage

Try the test export to CSV:

python3 test.py

Updating pip requirements

pip-compile

Pre-commit setup

pre-commit install --install-hooks

Bikeshare markets to add

USA

  • 119k/yr Pittsburgh (google drive links)
  • 180k/yr Austin (date and time fields separate)

World

  • 3868k/yr Ecobici (need station CSV)
  • 2900k/yr Toronto (needs more investigation)
  • 650k/yr Vancouver (google drive links)
Owner
Brady Law
prev SWE @lyft and @apple
Brady Law
A stock analysis app with streamlit

StockAnalysisApp A stock analysis app with streamlit. You select the ticker of the stock and the app makes a series of analysis by using the price cha

Antonio Catalano 50 Nov 27, 2022
Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Hatchet Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data. It is intended for analyzing

Lawrence Livermore National Laboratory 14 Aug 19, 2022
MIR Cheatsheet - Survival Guidebook for MIR Researchers in the Lab

MIR Cheatsheet - Survival Guidebook for MIR Researchers in the Lab

SeungHeonDoh 3 Jul 02, 2022
cLoops2: full stack analysis tool for chromatin interactions

cLoops2: full stack analysis tool for chromatin interactions Introduction cLoops2 is an extension of our previous work, cLoops. From loop-calling base

YaqiangCao 25 Dec 14, 2022
Catalogue data - A Python Scripts to prepare catalogue data

catalogue_data Scripts to prepare catalogue data. Setup Clone this repo. Install

BigScience Workshop 3 Mar 03, 2022
A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

lushi_script Introduction This script is to "SHUA" H1-2 map of Mercenaries mode of Hearthstone Installation Make sure you installed python=3.6. To in

210 Jan 02, 2023
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020] by Kaisiyuan Wang, Qianyi Wu, Linsen Song, Zhuoqian Yang, Wa

112 Dec 28, 2022
Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Using Streaming Twitter Data with Kafka and Spark Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream

Rustam Zokirov 1 Dec 06, 2021
BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

Angel Chavez 1 Oct 31, 2021
HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets

HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets that can be described as multidimensional arrays o

HyperSpy 411 Dec 27, 2022
Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

PandasVault ⁠— Advanced Pandas Functions and Code Snippets The only Pandas utility package you would ever need. It has no exotic external dependencies

Derek Snow 374 Jan 07, 2023
PyClustering is a Python, C++ data mining library.

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each

Andrei Novikov 1k Jan 05, 2023
Program that predicts the NBA mvp based on data from previous years.

NBA MVP Predictor A machine learning model using RandomForest Regression that predicts NBA MVP's using player data. Explore the docs » View Demo · Rep

Muhammad Rabee 1 Jan 21, 2022
Shot notebooks resuming the main functions of GeoPandas

Shot notebooks resuming the main functions of GeoPandas, 2 notebooks written as Exercises to apply these functions.

1 Jan 12, 2022
Fancy data functions that will make your life as a data scientist easier.

WhiteBox Utilities Toolkit: Tools to make your life easier Fancy data functions that will make your life as a data scientist easier. Installing To ins

WhiteBox 3 Oct 03, 2022
Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

Snakemake workflow: name A Snakemake workflow for description Usage The usage of this workflow is described in the Snakemake Workflow Catalog. If

Algorithms for reproducible bioinformatics (Koesterlab) 1 Dec 16, 2021
PyIOmica (pyiomica) is a Python package for omics analyses.

PyIOmica (pyiomica) This repository contains PyIOmica, a Python package that provides bioinformatics utilities for analyzing (dynamic) omics datasets.

G. Mias Lab 13 Jun 29, 2022
Tools for working with MARC data in Catalogue Bridge.

catbridge_tools Tools for working with MARC data in Catalogue Bridge. Borrows heavily from PyMarc

1 Nov 11, 2021
A multi-platform GUI for bit-based analysis, processing, and visualization

A multi-platform GUI for bit-based analysis, processing, and visualization

Mahlet 529 Dec 19, 2022