Apache HBase

Login to see prices

Apache HBase

Login to see prices

Apache HBase is an open source NoSQL database that provides real-time read/write access to those large datasets.

HBase scales linearly to handle huge data sets with billions of rows and millions of columns, and it easily combines data sources that use a wide variety of different structures and schemas. HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.

Apache HBase provides random, real time access to your data in Hadoop. It was created for hosting very large tables, making it a great choice to store multi-structured or sparse data. Users can query HBase for a particular point in time, making “flashback” queries possible. These following characteristics make HBase a great choice for storing semi-structured data like log data and then providing that data very quickly to users or applications integrated with HBase.

Add to Wishlist
Add to Wishlist
SKU: BD-0005 Categories: ,


This course prepares anyone for the Apache Hadoop platform. There are two primary distributions for Hadoop, that of Hortonworks and Cloudera. This course is architected in a way that allows maximum customization for your needs.

You can decide if the students need Hortonworks, Cloudera, or both. This affects labs and has no effect on course length.

The Administrator course deals with 3-5 deployed servers. Now the course may use Docker if you wish to deploy these servers.

Why We Learn Hadoop:

To have an increase in their access to Big Data, to make use of existing Big Data investments, pace up with growing Enterprise Adoption and because over all, there is has recently been an increase in demand for Hadoop Developers.

Course Audience:

Administrators who wish to explore Data Science and Big Data

Course Duration: 4 days

Introduction to HBase

  • HBase Cluster Basics
  • Installing the Ecosystem
  • Environment Variables and OS Settings
  • Installing HBase
  • Hadoop/ZooKeeper/HBase configurations
  • High Availability (HA)
  • HBase Data
  • Importing Data
  • MySQL
  • The bulk load tool
  • MapReduce
  • Regions
  • The HBase Shell
  • Executing Java methods
  • Row counter
  • WAL
  • HFile viewer
  • Querying HBase with Hive

HBase Administration

  • HBase Master UI
  • Using HBase Shell
  • HBase hbck
  • Monitoring HBase
  • HBase Table Disk Utilization
  • Monitoring Tools
  • Ganglia
  • OpenTSDB
  • Nagios
  • Cluster status
  • Hot regions

HBase Maintenance

  • RPC DEBUG-level logging
  • Node Maintenance
  • Decommissioning Nodes
  • Adding Nodes
  • Restarting HBase
  • Managing HBase processes
  • Deployment

Trouble Shooting HBASE

  • Tools
  • Common Errors
  • XceiverCount
  • Open File Count
  • Native Threads
  • HDFS clients
  • ZooKeeper
  • Start up
  • Disaster Recovery in HBase
  • CopyTable
  • Dump files on HDFS
  • NameNode metadata
  • Region starting keys
  • Cluster replication

HBase Performance-Oriented Configurations

  • Distributing disk I/O in Hadoop
  • Rack-awareness in Hadoop
  • Mounting disks
  • VM Swap settings
  • Java GC and HBase heap settings
  • Compression
  • Region Splits
  • Using YCSB
  • Region server handler count
  • Region creation algorithms
  • Update blocking on clusters

Tuning HBase

  • Client-side tuning
  • Client-side scanning
  • Tuning memory size for MemStores
  • Block cache configuration and block size to improve seek performance
  • Bloom Filters
  • Security and HBase
  • Kerberos authentication for Hadoop and HBase
  • Configuring HDFS security with Kerberos

Additional information

Course Duration

3 Day


On-site, Remotely

Lab Count

18 Labs


There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Close Menu