1-800-TRA-INING

Selected:

Spark for Beginners

Login to see prices

Spark for Beginners

Login to see prices

Apache Spark is an open source, scalable, massively parallel, in-memory execution environment for running analytics applications. Think of it as an in-memory layer that sits above multiple data stores, where data can be loaded into memory and analyzed in parallel across a cluster.

SKU: BD-0010 Category: Tag:
Add to Wishlist
Add to Wishlist

Description

Much like MapReduce, Spark works to distribute data across a cluster, and process that data in parallel. The difference is, unlike MapReduce—which shuffles files around on disk—Spark works in memory, making it much faster at processing data than MapReduce.

Why We Learn Spark:

To have an increase in their access to Big Data, to make use of existing Big Data investments, pace up with growing Enterprise Adoption and because over all, there is has recently been an increase in demand for Spark Developers which equals an increase in pay.

Course Audience:

Developers who wish to explore Data Science and Big Data

Course Duration: 4 days

Tools Needed:  

The Sylabus

Introduction to the Course

  • What you’ll need
  • Completion targets
 

Scala and Spark Overview

  • History of Spark and Scala
  • Recent changes to Scala
 

Scala IDE Options and Overview

  • ScalaIDE overview
  • Computer set-up time!
 

Scala and Spark Set-up and Installation

  • Windows introduction
  • Windows Scala and Spark installation
  • Atom Windows installation
  • Linux and Mac installation
 

Scala Programming: Level One

  • Arithmetic and Numbers
  • Values and Variables
  • Booleans and Comparison Operators
  • Strings and Basic Regex
  • Tuples
  • Scala Basics – Assessment Test Exercises
  • Scala Basics Assessment Test Questions
  • Scala Basics – Assessment Test Solutions
 

Collections

  • Intro to Collections
  • Lists
  • Arrays
  • Scala lists vs Arrays
  • Sets
  • Maps
 

Scala Programming: Level Two

  • Flow Control
  • For Loops
  • While Loops
  • Functions

Introduction to Machine Learning

  • Machine Learning with Spark
  • IntelliJ IDEA Installation Overview

Spark DataFrames with Scala

  • Introduction to Spark DataFrames
  • DataFrames Overview
  • Spark DataFrame Operations
  • GroupBy and Aggregate Functions
  • Missing data
  • Date and Timestamps

 

Regression with Spark

  • Introduction to Linear Regression
  • Introduction to Regression Section
  • Linear Regression Documentation Example
  • Alternate Linear Regression Data CSV File
  • csv
  • csv
  • Linear Regression Walkthrough Part 1
  • Linear Regression Walkthrough Part 2

 

Classification with Spark

  • Introduction to Classification
  • Classification Documentation Example
  • Spark Classification – Logistic Regression Example – Part 1
  • Spark Classification – Logistic Regression Example – Part 2

 

Model Evaluation

  • Model Evaluation Overview
  • Spark Model Evaluation – Documentation Example
  • Spark – Model Evaluation – Regression Example

 

Clustering with Spark

  • Introduction to Clustering with Spark
  • KMeans Theory Lecture
  • Example of KMeans with Spark

 

PCA with Spark

  • PCA Theory Overview
  • PCA with Spark – Project Exercise
  • PCA with Spark – Documentation Example

 

DataBricks and Spark

  • Databricks Overview
  • Introduction to Spark Recommendation Systems
  • Spark Recommender System Implementation
  • Zeppelin Notebooks on AWS Elastic MapReduce

Additional information

Course Duration

4 Day

Lab Count

12 Labs

Location

On-site

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

You may also like…

Close Menu
×
×

Cart