Explain to me the difference between HBase and HDFS

Question

I need to learn the difference between HDFS and HBase in more daetail

ravikiran · Answer 1 · Apr 12, 2019

Hadoop generally consists of three major components:

HDFS

It is a file system in Hadoop that allows you to store big-data in it and apply your business logic.

MapReduce

MapReduce is a java program which allows you to apply your business logic through mapper and reducer and process the data in a distributed fashion.

Yarn

It acts as an intermediate manager between HDFS and Yarn when it comes to the requirement for resources like memory and processor units.

The only issue with the HDFS was it could not process data when it is not in a sequential manner and it lacks Random read-write option is unavailable in the HDFS. This when the HBase got into the picture.

Hadoop

Real-time streaming is impossible since it needs time to load the data to HDFS to carry out further operations.
Follows write once and read many times principle.
Designed to gain access for streaming data.

HBase

Stores key/value pairs in columnar fashion (columns are clubbed together as column families).
Provides low latency access to small amounts of data from within a large data set.
Provides flexible data model.

Hadoop is a batch processing tool hence it can never be used for real-time data Processing.