BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Last update: Jan 06, 2022

Related tags

Overview

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

Introduction

BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters.

Installation

Please download BigDL Packages or pip install BigDL (conda)

How to run Program on Spark

Usage: spark-submit-with-bigdl.sh + [options] + file.py

Options:

master MASTER URL: spark, yarn, k8s, local.
local[k]: Run Spark locally with k worker threads as logical cores on your machine.
File.py: File for executing program.

System configuration

Program run on system includes:

System/Host Processor: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
CPU(s): 48
Core(s) per socket: 12
Socket(s): 2
Memory: 183 G (free)

Data Description and Run Model

It is a dataset of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9. The MNIST data is split into three parts: 60,000 data points of training data, 10,000 points of test data.

With this BigDL Problem, We use LSTM model for MNIST digit classification problem.

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Related tags

Overview

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

Introduction

Installation

How to run Program on Spark

System configuration

Data Description and Run Model

BigDL Performance Evaluation

Execution running time

Computation Evaluation (SPEED UP)

Owner

Vo Cong Thanh

Statistical Rethinking course winter 2022

pandas: powerful Python data analysis toolkit

Tools for the analysis, simulation, and presentation of Lorentz TEM data.

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

This python script allows you to manipulate the audience data from Sl.ido surveys

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

Port of dplyr and other related R packages in python, using pipda.

A stock analysis app with streamlit

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Python-based Space Physics Environment Data Analysis Software

.npy, .npz, .mtx converter.

Time ranges with python

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

WithPipe is a simple utility for functional piping in Python.

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN

Minimal working example of data acquisition with nidaqmx python API

High Dimensional Portfolio Selection with Cardinality Constraints

Python Practicum - prepare for your Data Science interview or get a refresher.

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]