1-800-TRA-INING

Selected:

Apache Hadoop for Administrators

Login to see prices

Apache Hadoop for Administrators

Login to see prices

Apache Hadoop is an open source, scalable, massively parallel, in-memory database environment for data farms and data lakes. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

SKU: BD-0008 Category: Tag:
Add to Wishlist
Add to Wishlist

Description

This course prepares anyone for the Apache Hadoop platform. There are two primary distributions for Hadoop, that of Hortonworks and Cloudera. This course is architected in a way that allows maximum customization for your needs.

You can decide if the students need Hortonworks, Cloudera, or both. This affects labs and has no effect on course length.

The Administrator course deals with 3-5 deployed servers. Now the course may use Docker if you wish to deploy these servers.

Why We Learn Hadoop:

To have an increase in their access to Big Data, to make use of existing Big Data investments, pace up with growing Enterprise Adoption and because over all, there is has recently been an increase in demand for Hadoop Developers.

Course Audience:

Administrators who wish to explore Data Science and Big Data

Course Duration: 3 days

BIG DATA

  • What is Big Data?
  • Typical Distributed Systems
  • A Short History of Hadoop
  • Who are the players?
  • Hadoop Alternatives

HADOOP

  • What is Hadoop
  • YARN
  • Key differences between 1.X and 2.X

HDFS

  • What is HDFS
  • HDFS Architecture
  • Writing and reading files
  • Understanding Block storage
  • Nodes
  • HDFS client connections

YARN

  • MapReduce as a Pattern
  • MapReduce YARN Style
  • Tracing MapReduce Job on Hadoop 2.0

HADOOP CLUSTER

  • Most common cluster topology
  • Sizing considerations
  • Hardware, Software, OS, network considerations
  • File Systems – Windows, and Linux
  • Hadoop Configuration Files

ADVANCED CLUSTERING

  • Advanced HDFS
  • Advanced MapReduce
  • Advanced YARN configurations
  • Rack Aware Clusters
  • Including and Excluding Hosts

CLUSTER MONITORING & MANAGEMENT

  • Monitoring Options in the Ecosystem
  • Monitoring YARN-based clusters
  • Ganglia on Hadoop
  • Nagios
  • Interop of Ganglia & Nagios
  • Other options
  • Monitoring Java Processes
  • Understanding JVM Memory
  • Memory Analyzers
  • MX Monitoring

HADOOP SERVICES

  • HDFS NFS Gateway
  • Configuring NFS Gateway
  • Using NFS Gateway Services

HADOOP SECURITY

  • Security over Hadoop
  • Kerberos
  • Authentication & Authorization
  • Knox

CLUSTER MAINTENANCE

  • HDFS snapshots
  • Backing up HDFS Clusters
  • Commissioning Hadoop nodes
  • Load Balancing Hadoop

SCHEDULERS

  • Hadoop Job Scheduling
  • Built-in schedulers
  • Capacity schedulers and Queues
  • Configuring Schedulers & Permissions

HIVE & PIG

  • Introduction to Hive & Pig
  • Comparing Hive with RDBMS
  • Hive & Pig Components & Metastore
  • HiveServer2
  • Hive Command Line Interface
  • Defining Hive Tables
  • Loading Data into Hive & Performing Queries
  • Hive Security
  • Pig Tables & Syntax

OOZIE

  • Introduction to Oozie
  • Architecture
  • Administration of Oozie
 

SQOOP

  • The Sqoop Import Tool
  • Importing & Exporting data with Sqoop
 

FLUME

  • Flume Sources, Sinks and Channels
  • Flume Interceptors
  • Flume configuration
  • Monitoring Flume

MOVING DATA

  • Data Movement with Hadoop
  • ETL using Hadoop
  • Ingesting data with Hadoop
  • Using Hue
  • Distributed Copy
 

HADOOP HA

  • HA for Hadoop
  • How HA works
  • Failover on Hadoop

Additional information

Course Duration

3 Day

Location

On-site, Remotely

Lab Count

18 Labs

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Close Menu
×
×

Cart