Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Overview

Data Scientist Learning Plan

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials.

This learning path consists of several series of self-paced (E-Learning) courses and paid instructor-led courses. If you are interested in ILT, please be sure to search the course catalog for more information.

Learning Plan Structure

  • What is the Databricks Lakehouse Platform?

    This course (formerly Fundamentals of the Databricks Lakehouse Platform) is designed for everyone who is brand new to the Platform and wants to learn more about what it is, why it was developed, what it does, and the components that make it up.

    Our goal is that by the time you finish this course, you’ll have a better understanding of the Platform in general and be able to answer questions like: What is Databricks? Where does Databricks fit into my workflow? How have other customers been successful with Databricks?

    Learning objectives

    • Describe what the Databricks Lakehouse Platform is.
    • Explain the origins of the Lakehouse data management paradigm.
    • Outline fundamental problems that cause most enterprises to struggle with managing and making use of their data.
    • Identify the most popular components of the Databricks Lakehouse - Platform used by data practitioners, depending on their unique role.
    • Give examples of organizations that have used the Databricks Lakehouse Platform to streamline big data processing and analytics.
  • What is Delta Lake?

    Today, many organizations struggle with achieving successful big data and artificial intelligence (AI) projects. One of the biggest challenges they face is ensuring that quality, reliable data is available to data practitioners running these projects. After all, an organization that does not have reliable data will not succeed with AI. To help organizations bring structure, reliability, and performance to their data lakes, Databricks created Delta Lake.

    Delta Lake is an open format storage layer that sits on top of your organization’s data lake. It is the foundation of a cost-effective, highly scalable Lakehouse and is an integral part of the Databricks Lakehouse Platform.

    In this course (formerly Fundamentals of Delta Lake), we’ll break down the basics behind Delta Lake - what it does, how it works, and why it is valuable from a business perspective, to any organization with big data and AI projects.

    Learning objectives

    • Describe how Delta Lake fits into the Databricks Lakehouse Platform.
    • Explain the four elements encompassed by Delta Lake.
    • Summarize high-level Delta Lake functionality that helps organizations solve common challenges related to enterprise-scale data analytics.
    • Articulate examples of how organizations have employed Delta Lake on Databricks to improve business outcomes.
  • What is Databricks SQL?

    Databricks SQL offers SQL users a platform for querying, analyzing, and visualizing data. This course (formerly Fundamentals of Databricks SQL) guides users through the interface and demonstrates many of the tools and features available in the Databricks SQL interface.

    Learning objectives

    • Describe the basics of the Databricks SQL service.
    • Describe the benefits of using Databricks SQL to perform data analyses.
    • Describe how to complete a basic query, visualization, and dashboard workflow using Databricks SQL.
  • What is Databricks Machine Learning?

    Databricks Machine Learning offers data scientists and other machine learning practitioners a platform for completing and managing the end-to-end machine learning lifecycle. This course (formerly Fundamentals of Databricks Machine Learning) guides business leaders and practitioners through a basic overview of Databricks Machine Learning, the benefits of using Databricks Machine Learning, its fundamental components and functionalities, and examples of successful customer use.

    Learning objectives

    • Describe the basic overview of Databricks Machine Learning.
    • Identify how using Databricks Machine Learning benefits data science and machine learning teams.
    • Summarize the fundamental components and functionalities of Databricks Machine Learning.
    • Exemplify successful use cases of Databricks Machine Learning by real Databricks customers.
  • Fundamentals of the Databricks Lakehouse Platform Accreditation

  • Apache Spark Programming with Databricks

  • Certification Overview Course for the Databricks Certified Associate Developer for Apache Spark Exam

  • Getting Started with Databricks Machine Learning

  • Scaling Machine Learning Pipelines

Owner
Trung-Duy Nguyen
Trung-Duy Nguyen
Gathering data of likes on Tinder within the past 7 days

tinder_likes_data Gathering data of Likes Sent on Tinder within the past 7 days. Versions November 25th, 2021 - Functionality to get the name and age

Alex Carter 12 Jan 05, 2023
Streamz helps you build pipelines to manage continuous streams of data

Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelines that involve branching, joining, flow control, feedbac

Python Streamz 1.1k Dec 28, 2022
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and lo

Coiled 102 Nov 10, 2022
Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python.

Fast Laplacian Eigenmaps in python Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python. Comes with an wrapper for NMS

17 Jul 09, 2022
We're Team Arson and we're using the power of predictive modeling to combat wildfires.

We're Team Arson and we're using the power of predictive modeling to combat wildfires. Arson Map Inspiration There’s been a lot of wildfires in Califo

Jerry Lee 3 Oct 17, 2021
BErt-like Neurophysiological Data Representation

BENDR BErt-like Neurophysiological Data Representation This repository contains the source code for reproducing, or extending the BERT-like self-super

114 Dec 23, 2022
Flexible HDF5 saving/loading and other data science tools from the University of Chicago

deepdish Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog: htt

UChicago - Department of Computer Science 255 Dec 10, 2022
small package with utility functions for analyzing (fly) calcium imaging data

fly2p Tools for analyzing two-photon (2p) imaging data collected with Vidrio Scanimage software and micromanger. Loading scanimage data relies on scan

Hannah Haberkern 3 Dec 14, 2022
The Dash Enterprise App Gallery "Oil & Gas Wells" example

This app is based on the Dash Enterprise App Gallery "Oil & Gas Wells" example. For more information and more apps see: Dash App Gallery See the Dash

Austin Caudill 1 Nov 08, 2021
Tools for the analysis, simulation, and presentation of Lorentz TEM data.

ltempy ltempy is a set of tools for Lorentz TEM data analysis, simulation, and presentation. Features Single Image Transport of Intensity Equation (SI

McMorran Lab 1 Dec 26, 2022
The lastest all in one bombing tool coded in python uses tbomb api

BaapG-Attack is a python3 based script which is officially made for linux based distro . It is inbuit mass bomber with sms, mail, calls and many more bombing

59 Dec 25, 2022
sportsdataverse python package

sportsdataverse-py See CHANGELOG.md for details. The goal of sportsdataverse-py is to provide the community with a python package for working with spo

Saiem Gilani 37 Dec 27, 2022
AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures.

AptaMAT Purpose AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures. The method is based on the compa

GEC UTC 3 Nov 03, 2022
Gaussian processes in TensorFlow

Website | Documentation (release) | Documentation (develop) | Glossary Table of Contents What does GPflow do? Installation Getting Started with GPflow

GPflow 1.7k Jan 06, 2023
Example Of Splunk Search Query With Python And Splunk Python SDK

SSQAuto (Splunk Search Query Automation) Example Of Splunk Search Query With Python And Splunk Python SDK installation: ➜ ~ git clone https://github.c

AmirHoseinTangsiriNET 1 Nov 14, 2021
Lale is a Python library for semi-automated data science.

Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-

International Business Machines 293 Dec 29, 2022
A utility for functional piping in Python that allows you to access any function in any scope as a partial.

WithPartial Introduction WithPartial is a simple utility for functional piping in Python. The package exposes a context manager (used with with) calle

Michael Milton 1 Oct 26, 2021
Using approximate bayesian posteriors in deep nets for active learning

Bayesian Active Learning (BaaL) BaaL is an active learning library developed at ElementAI. This repository contains techniques and reusable components

ElementAI 687 Dec 25, 2022
Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data

Statistical_Modelling Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data Statistical Methods for Decision Ma

Avnika Mehta 1 Jan 27, 2022
MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

Isabela Caetano 1 Dec 09, 2021