Hadoop generally consists of three major components:
- It is a file system in Hadoop that allows you to store big-data in it and apply your business logic.
- MapReduce is a java program which allows you to apply your business logic through mapper and reducer and process the data in a distributed fashion.
- It acts as an intermediate manager between HDFS and Yarn when it comes to the requirement for resources like memory and processor units.
The only issue with the HDFS was it could not process data when it is not in a sequential manner and it lacks Random read-write option is unavailable in the HDFS. This when the HBase got into the picture.
- Real-time streaming is impossible since it needs time to load the data to HDFS to carry out further operations.
- Follows write once and read many times principle.
- Designed to gain access for streaming data.
- Stores key/value pairs in columnar fashion (columns are clubbed together as column families).
- Provides low latency access to small amounts of data from within a large data set.
- Provides flexible data model.
Hadoop is a batch processing tool hence it can never be used for real-time data Processing.