This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

Overview

uber-pickups-analysis

Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city

Information about data set

The dataset contains, roughly, TWO groups of files: ● Uber trip data from 2014 (April - September), separated by month, with detailed location information. ● Uber trip data from 2015 (January - June), with less fine-grained location information.

Uber trip data from 2014 There are six files of raw data on Uber pickups in New York City from April to September 2014. The files are separated by month and each has the following columns: ● Date/Time : The date and time of the Uber pickup ● Lat : The latitude of the Uber pickup ● Lon : The longitude of the Uber pickup ● Base : The TLC base company code affiliated with the Uber pickup. These files are named:

● uber-raw-data-apr14.csv ● uber-raw-data-aug14.csv ● uber-raw-data-jul14.csv ● uber-raw-data-jun14.csv ● uber-raw-data-may14.csv ● uber-raw-data-sep14.csv

Uber trip data from 2015

Also included is the file uber-raw-data-janjune-15.csv This file has the following columns: ● Dispatching_base_num : The TLC base company code of the base that dispatched the Uber. ● Pickup_date : The date and time of the Uber pickup ● Affiliated_base_num : The TLC base company code affiliated with the Uber pickup. ● locationID : The pickup location ID affiliated with the Uber pickup These files are named:

  • uber-raw-data-janjune-15.csv

motive of Project

To analyze the data of the customer rides and visualize the data to find insights that can help improve business. Data analysis and visualization is an important part of data science. They are used to gather insights from the data and with visualization you can get quick information from the data.

How to Run the Project

In order to run the project just download the data from above mentioned source then run any file.

Prerequisites

You need to have installed following softwares and libraries in your machine before running this project.

Python 3 Anaconda: It will install ipython notebook and most of the libraries which are needed like sklearn, pandas, seaborn, matplotlib, numpy, scipy.

Installing

Python 3: https://www.python.org/downloads/ Anaconda: https://www.anaconda.com/download/

Authors

DEVA DEEKSHITH and kilari jaswanth(https://github.com/Kilarijaswanth)- combined work

Owner
B DEVA DEEKSHITH
B DEVA DEEKSHITH
Book Recommender System Using Sci-kit learn N-neighbours

Model-Based-Recommender-Engine I created a book Recommender System using Sci-kit learn's N-neighbours algorithm for my model and the streamlit library

1 Jan 13, 2022
This is my implementation on the K-nearest neighbors algorithm from scratch using Python

K Nearest Neighbors (KNN) algorithm In this Machine Learning world, there are various algorithms designed for classification problems such as Logistic

sonny1902 1 Jan 08, 2022
Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared

Feature-Engineering Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared. When the dataset

kemalgunay 5 Apr 21, 2022
李航《统计学习方法》复现

本项目复现李航《统计学习方法》每一章节的算法 特点: 笔记摘要:在每个文件开头都会有一些核心的摘要 pythonic:这里会用尽可能规范的方式来实现,包括编程风格几乎严格按照PEP8 循序渐进:前期的算法会更list的方式来做计算,可读性比较强,后期几乎完全为numpy.array的计算,并且辅助详

58 Oct 22, 2021
Bottleneck a collection of fast, NaN-aware NumPy array functions written in C.

Bottleneck Bottleneck is a collection of fast, NaN-aware NumPy array functions written in C. As one example, to check if a np.array has any NaNs using

Python for Data 835 Dec 27, 2022
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 363 Dec 14, 2022
A quick reference guide to the most commonly used patterns and functions in PySpark SQL

Using PySpark we can process data from Hadoop HDFS, AWS S3, and many file systems. PySpark also is used to process real-time data using Streaming and

Sundar Ramamurthy 53 Dec 21, 2022
Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

James Ritchie 204 Nov 18, 2022
Practical Time-Series Analysis, published by Packt

Practical Time-Series Analysis This is the code repository for Practical Time-Series Analysis, published by Packt. It contains all the supporting proj

Packt 325 Dec 23, 2022
Machine learning algorithms implementation

Machine learning algorithms implementation This repository consisits of implementation of various machine learning algorithms. The algorithms implemen

Karun Dawadi 1 Jan 03, 2022
Nevergrad - A gradient-free optimization platform

Nevergrad - A gradient-free optimization platform nevergrad is a Python 3.6+ library. It can be installed with: pip install nevergrad More installati

Meta Research 3.4k Jan 08, 2023
PySurvival is an open source python package for Survival Analysis modeling

PySurvival What is Pysurvival ? PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or p

Square 265 Dec 27, 2022
Formulae is a Python library that implements Wilkinson's formulas for mixed-effects models.

formulae formulae is a Python library that implements Wilkinson's formulas for mixed-effects models. The main difference with other implementations li

34 Dec 21, 2022
Distributed deep learning on Hadoop and Spark clusters.

Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version

Yahoo 1.3k Dec 28, 2022
An open-source library of algorithms to analyse time series in GPU and CPU.

An open-source library of algorithms to analyse time series in GPU and CPU.

Shapelets 216 Dec 30, 2022
Combines Bayesian analyses from many datasets.

PosteriorStacker Combines Bayesian analyses from many datasets. Introduction Method Tutorial Output plot and files Introduction Fitting a model to a d

Johannes Buchner 19 Feb 13, 2022
A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.

Utsav 33 Dec 03, 2022
A collection of video resources for machine learning

Machine Learning Videos This is a collection of recorded talks at machine learning conferences, workshops, seminars, summer schools, and miscellaneous

Dustin Tran 1.5k Dec 29, 2022
XAI - An eXplainability toolbox for machine learning

XAI - An eXplainability toolbox for machine learning XAI is a Machine Learning library that is designed with AI explainability in its core. XAI contai

The Institute for Ethical Machine Learning 875 Dec 27, 2022
Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

Amplo 10 May 15, 2022