HDFS is a Java based distributed file system that allows you to store large data across multiple nodes in a Hadoop cluster. Whereas HBase is a NoSQL database (similar as NTFS and MySQL).
As Both HDFS and HBase stores all kind of data such as structured, semi-structured and unstructured in a distributed environment.
Differences between HDFS & HBase
- HBase provides low latency access to small amounts of data within large data sets while HDFS provides high latency operations.
- HBase supports random read and writes while HDFS supports WORM (Write once Read Many or Multiple times).
- HDFS is basically or primarily accessed through MapReduce jobs while HBase is accessed through shell commands, Java API, REST, Avro or Thrift API.
HDFS stores large data sets in a distributed environment and leverages batch processing on that data.
While HBase stores data in a column oriented manner where each column is stored together so that, reading becomes faster leveraging real time processing.