Let me explain the major differences between both.
Hadoop is basically made up of three components.
- Hadoop Distributed File-System
HDFS is the storage unit of Hadoop or you can also call it as a file system of Hadoop.
MapReduce is considered as a Java Program which is designed to perform the computational operations on data in order to process it.
YARN is like a mediator in between the HDFS and MapReduce.
HDFS is generally designed to store data in huge amounts in a distributed manner amongst commodity hardware.
HDFS provides faster data accessing but lacks random read and write capabilities.
This is where the HBase is required as it is capable to provide a NoSQL database on top of Hadoop cluster and provides you real-time random read and write options.
Both HBase and HDFS can provide many operations that can be performed on data.
HDFS stores data in the form of files and HBase stores data in the form of key-value pairs.
Some of the important differences are as follows.
Optimized for streaming access to large files.
Follows write-once read-many ideology.
Doesn't support random read/write.
Stores key/value pairs in columnar fashion (columns are clubbed together as column families).
Provides low latency access to small amounts of data from within a large data set.
Provides flexible data model.
Hadoop is limited to Batch-Processing while on the other hand HBase is used in real-time data processing environment.