Spark for Beginners

Login to see prices

Spark for Beginners

Login to see prices

Apache Spark is an open source, scalable, massively parallel, in-memory execution environment for running analytics applications. Think of it as an in-memory layer that sits above multiple data stores, where data can be loaded into memory and analyzed in parallel across a cluster.

Add to Wishlist
Add to Wishlist
SKU: BD-0010 Category:


Much like MapReduce, Spark works to distribute data across a cluster, and process that data in parallel. The difference is, unlike MapReduce—which shuffles files around on disk—Spark works in memory, making it much faster at processing data than MapReduce.

Why We Learn Spark:

To have an increase in their access to Big Data, to make use of existing Big Data investments, pace up with growing Enterprise Adoption and because over all, there is has recently been an increase in demand for Spark Developers which equals an increase in pay.

Course Audience:

Developers who wish to explore Data Science and Big Data

Course Duration: 4 days

Tools Needed:  

The Sylabus

Introduction to the Course

  • What you’ll need
  • Completion targets

Scala and Spark Overview

  • History of Spark and Scala
  • Recent changes to Scala

Scala IDE Options and Overview

  • ScalaIDE overview
  • Computer set-up time!

Scala and Spark Set-up and Installation

  • Windows introduction
  • Windows Scala and Spark installation
  • Atom Windows installation
  • Linux and Mac installation

Scala Programming: Level One

  • Arithmetic and Numbers
  • Values and Variables
  • Booleans and Comparison Operators
  • Strings and Basic Regex
  • Tuples
  • Scala Basics – Assessment Test Exercises
  • Scala Basics Assessment Test Questions
  • Scala Basics – Assessment Test Solutions


  • Intro to Collections
  • Lists
  • Arrays
  • Scala lists vs Arrays
  • Sets
  • Maps

Scala Programming: Level Two

  • Flow Control
  • For Loops
  • While Loops
  • Functions

Introduction to Machine Learning

  • Machine Learning with Spark
  • IntelliJ IDEA Installation Overview

Spark DataFrames with Scala

  • Introduction to Spark DataFrames
  • DataFrames Overview
  • Spark DataFrame Operations
  • GroupBy and Aggregate Functions
  • Missing data
  • Date and Timestamps


Regression with Spark

  • Introduction to Linear Regression
  • Introduction to Regression Section
  • Linear Regression Documentation Example
  • Alternate Linear Regression Data CSV File
  • csv
  • csv
  • Linear Regression Walkthrough Part 1
  • Linear Regression Walkthrough Part 2


Classification with Spark

  • Introduction to Classification
  • Classification Documentation Example
  • Spark Classification – Logistic Regression Example – Part 1
  • Spark Classification – Logistic Regression Example – Part 2


Model Evaluation

  • Model Evaluation Overview
  • Spark Model Evaluation – Documentation Example
  • Spark – Model Evaluation – Regression Example


Clustering with Spark

  • Introduction to Clustering with Spark
  • KMeans Theory Lecture
  • Example of KMeans with Spark


PCA with Spark

  • PCA Theory Overview
  • PCA with Spark – Project Exercise
  • PCA with Spark – Documentation Example


DataBricks and Spark

  • Databricks Overview
  • Introduction to Spark Recommendation Systems
  • Spark Recommender System Implementation
  • Zeppelin Notebooks on AWS Elastic MapReduce

Additional information

Course Duration

4 Day

Lab Count

12 Labs




There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

You may also like…

Close Menu