bigdata_analyse 大数据分析项目

Overview

bigdata_analyse

大数据分析项目

wish

采用不同的技术栈,通过对不同行业的数据集进行分析,期望达到以下目标:

  • 了解不同领域的业务分析指标
  • 深化数据处理、数据分析、数据可视化能力
  • 增加大数据批处理、流处理的实践经验
  • 增加数据挖掘的实践经验

tip

  • 项目主要使用的编程语言是 python、sql、hql
  • .ipynb 可以用 jupyter notebook 打开,如何安装, 可以参考 jupyter notebook

jupyter notebook 是一种网页交互形式的 python 编辑器,直接通过 pip 安装,也支持 markdown,很适合用来做数据分析可视化以及写文章、写示例代码等。

list

主题 处理方式 技术栈 数据集下载
1 亿条淘宝用户行为数据分析 离线处理 清洗 hive + 分析 hive + 可视化 echarts 阿里云 或者 百度网盘 提取码:5ipq
1000 万条淘宝用户行为数据实时分析 实时处理 数据源 kafka + 实时分析 flink + 可视化(es + kibana) 百度网盘 提取码:m4mc
300 万条《野蛮时代》的玩家数据分析 离线处理 清洗 pandas + 分析 mysql + 可视化 pyecharts 百度网盘 提取码:paq4
130 万条深圳通刷卡数据分析 离线处理 清洗 pandas + 分析 impala + 可视化 dbeaver 百度网盘 提取码:t561
10 万条厦门招聘数据分析 离线处理 清洗 pandas + 分析 hive + 可视化 ( hue + pyecharts ) + 预测 sklearn 百度网盘 提取码:9wx0
7000 条租房数据分析 离线处理 清洗 pandas + 分析 sqlite + 可视化 matplotlib 百度网盘 提取码:9en3
6000 条倒闭企业数据分析 离线处理 清洗 pandas + 分析 pandas + 可视化 (jupyter notebook + pyecharts) 百度网盘 提取码:xvgm

refer

  1. https://tianchi.aliyun.com/dataset/
  2. https://opendata.sz.gov.cn/data/api/toApiDetails/29200_00403601
  3. https://www.kesci.com/home/dataset
Common bioinformatics database construction

biodb Common bioinformatics database construction 1.taxonomy (Substance classification database) Download the database wget -c https://ftp.ncbi.nlm.ni

sy520 2 Jan 04, 2022
This mini project showcase how to build and debug Apache Spark application using Python

Spark app can't be debugged using normal procedure. This mini project showcase how to build and debug Apache Spark application using Python programming language. There are also options to run Spark a

Denny Imanuel 1 Dec 29, 2021
A crude Hy handle on Pandas library

Quickstart Hyenas is a curde Hy handle written on top of Pandas API to allow for more elegant access to data-scientist's powerhouse that is Pandas. In

Peter Výboch 4 Sep 05, 2022
Active Learning demo using two small datasets

ActiveLearningDemo How to run step one put the dataset folder and use command below to split the dataset to the required structure run utils.py For ea

3 Nov 10, 2021
An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

qgrid Qgrid is a Jupyter notebook widget which uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your D

Quantopian, Inc. 2.9k Jan 08, 2023
Stream-Kafka-ELK-Stack - Weather data streaming using Apache Kafka and Elastic Stack.

Streaming Data Pipeline - Kafka + ELK Stack Streaming weather data using Apache Kafka and Elastic Stack. Data source: https://openweathermap.org/api O

Felipe Demenech Vasconcelos 2 Jan 20, 2022
bigdata_analyse 大数据分析项目

bigdata_analyse 大数据分析项目 wish 采用不同的技术栈,通过对不同行业的数据集进行分析,期望达到以下目标: 了解不同领域的业务分析指标 深化数据处理、数据分析、数据可视化能力 增加大数据批处理、流处理的实践经验 增加数据挖掘的实践经验

Way 2.4k Dec 30, 2022
Analysis scripts for QG equations

qg-edgeofchaos Analysis scripts for QG equations FIle/Folder Structure eigensolvers.py - Spectral and finite-difference solvers for Rossby wave eigenf

Norman Cao 2 Sep 27, 2022
Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine Intro This repo contains the python/stan version of the Statistical Rethinking

Andrés Suárez 3 Nov 08, 2022
Predictive Modeling & Analytics on Home Equity Line of Credit

Predictive Modeling & Analytics on Home Equity Line of Credit Data (Python) HMEQ Data Set In this assignment we will use Python to examine a data set

Dhaval Patel 1 Jan 09, 2022
A notebook to analyze Amazon Recommendation Review Dataset.

Amazon Recommendation Review Dataset Analyzer A notebook to analyze Amazon Recommendation Review Dataset. Features Calculates distinct user count, dis

isleki 3 Aug 22, 2022
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Trung-Duy Nguyen 27 Nov 01, 2022
Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

RPA Challenge in Python Projeto para realizar o RPA Challenge (www.rpachallenge.com), utilizando Python. O objetivo deste desafio é criar um fluxo de

Henrique A. Lourenço 1 Apr 12, 2022
Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

Sensitivity Analysis Library (SALib) Python implementations of commonly used sensitivity analysis methods. Useful in systems modeling to calculate the

SALib 663 Jan 05, 2023
Tokyo 2020 Paralympics, Analytics

Tokyo 2020 Paralympics, Analytics Thanks for checking out my app! It was built entirely using matplotlib and Tokyo 2020 Paralympics data. This applica

Petro Ivaniuk 1 Nov 18, 2021
Anomaly Detection with R

AnomalyDetection R package AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the pre

Twitter 3.5k Dec 27, 2022
TextDescriptives - A Python library for calculating a large variety of statistics from text

A Python library for calculating a large variety of statistics from text(s) using spaCy v.3 pipeline components and extensions. TextDescriptives can be used to calculate several descriptive statistic

150 Dec 30, 2022
Using Python to derive insights on particular Pokemon, Types, Generations, and Stats

Pokémon Analysis Andreas Nikolaidis February 2022 Introduction Exploratory Analysis Correlations & Descriptive Statistics Principal Component Analysis

Andreas 1 Feb 18, 2022
Airflow ETL With EKS EFS Sagemaker

Airflow ETL With EKS EFS & Sagemaker (en desarrollo) Diagrama de la solución Imp

1 Feb 14, 2022
Nobel Data Analysis

Nobel_Data_Analysis This project is for analyzing a set of data about people who have won the Nobel Prize in different fields and different countries

Mohammed Hassan El Sayed 1 Jan 24, 2022