Hadoop Interview Questions On HBase In 2024

Last updated on Nov 02,2023 33.6K Views
Shubham Sinha is a Big Data and Hadoop expert working as a... Shubham Sinha is a Big Data and Hadoop expert working as a Research Analyst at Edureka. He is keen to work with Big Data...

Hadoop Interview Questions On HBase In 2024

edureka.co

Apache HBase Interview Questions

Looking out for Apache HBase Interview Questions that are frequently asked by employers? Here is the blog on Apache HBase interview questions in Hadoop Interview Questions series. I hope you must not have missed the earlier blogs of our Hadoop Interview Question series.

After going through the HBase interview questions, you will get an in-depth knowledge of questions that are frequently asked by employers in Hadoop interviews related to HBase. This will definitely help you to kickstart your career as a Big Data Engineer and become a Big Data certified professional. 

In case you have attended any HBase interview previously, we encourage you to add your questions in the comments tab. We will be happy to answer them, and spread the word to the community of fellow job seekers.

Hadoop Interview Questions and Answers | Big Data Interview Questions | Edureka

This Edureka video on Hadoop Tutorial on Hadoop Interview Questions and Answers will help you to prepare yourself for Big Data and Hadoop interviews.

Important points to remember about Apache HBase:

♣ Tip: Before going through this Apache HBase interview questions, I would suggest you to go through Apache HBase Tutorial and HBase Architecture to revise your HBase concepts.

Now moving on, let us look at the Apache HBase interview questions.

1. What are the key components of HBase?

The key components of HBase are Zookeeper, RegionServer and HBase Master. 

Key components of HBase

ComponentDescription
Region ServerA table can be divided into several regions. A group of regions is served to the clients by a Region Server
HMasterIt coordinates and manages the Region Servers (similar as NameNode manages DataNodes in HDFS).
ZooKeeperZookeeper acts like as a coordinator inside HBase distributed environment. It helps in maintaining server state inside the cluster by communicating through sessions.

2. When would you use HBase?

3. What is the use of get() method?

get() method is used to read the data from the table.

4. Define the difference between Hive and HBase?

Apache Hive is a data warehousing infrastructure built on top of Hadoop. It helps in querying data stored in HDFS for analysis using Hive Query Language (HQL), which is a SQL-like language, that gets translated into MapReduce jobs. Hive performs batch processing on Hadoop.

Apache HBase is NoSQL key/value store which runs on top of HDFS. Unlike Hive, HBase operations run in real-time on its database rather than MapReduce jobs. HBase partitions the tables, and the tables are further splitted into column families. 

Hive and HBase are two different Hadoop based technologies – Hive is an SQL-like engine that runs MapReduce jobs, and HBase is a NoSQL key/value database of Hadoop. We can use them together. Hive can be used for analytical queries while HBase for real-time querying. Data can even be read and written from HBase to Hive and vice-versa.

5. Explain the data model of HBase.

HBase comprises of:

6. Define column families?

Column Family is a collection of columns, whereas row is a collection of column families.

7. Define standalone mode in HBase?

It is a default mode of HBase. In standalone mode, HBase does not use HDFS—it uses the local filesystem instead—and it runs all HBase daemons and a local ZooKeeper in the same JVM process.

8. What is decorating Filters?

It is useful to modify, or extend, the behavior of a filter to gain additional control over the returned data. These types of filters are known as decorating filter. It includes SkipFilter and WhileMatchFilter.

9. What is RegionServer?

A table can be divided into several regions. A group of regions is served to the clients by a Region Server.

10. What are the data manipulation commands of HBase?

Data Manipulation commands of HBase are:

11. Which code is used to open a connection in HBase?

Following code is used to open a HBase connection, here users is my HBase table:

Configuration myConf = HBaseConfiguration.create();
HTable table = new HTable(myConf, “users”);

12. What is the use of truncate command?

It is used to disable, drop and recreate the specified tables.

♣ Tip: To delete table first disable it, then delete it.

13. What happens when you issue a delete command in HBase?

Once you issue a delete command in HBase for cell, column or column family, it is not deleted instantly. A tombstone marker in inserted. Tombstone is a specified data, which is stored along with standard data. This tombstone makes hides all the deleted data.

The actual data is deleted at the time of major compaction. In Major compaction, HBase merges and recommits the smaller HFiles of a region to a new HFile. In this process, the same column families are placed together in the new HFile. It drops deleted and expired cell in this process. All the results from scan and get filters the deleted cells.

14. What are different tombstone markers in HBase?

There are three types of tombstone markers in HBase:

15. HBase blocksize is configured on which level?

The blocksize is configured per column family and the default value is 64 KB. This value can be changed as per requirements.

16. Which command is used to run HBase Shell?

./bin/hbase shell command is used to run the HBase shell. Execute this command in HBase directory.

17. Which command is used to show the current HBase user?

whoami command is used to show HBase user.

18. What is the full form of MSLAB?

MSLAB stands for Memstore-Local Allocation Buffer. Whenever a request thread needs to insert data into a MemStore, it doesn’t allocates the space for that data from the heap at large, but rather allocates memory arena dedicated to the target region.

19. Define LZO?

Lempel-Ziv-Oberhumer (LZO) is a lossless data compression algorithm that focuses on decompression speed.

20. What is HBase Fsck?

HBase comes with a tool called hbck which is implemented by the HBaseFsck class. HBaseFsck (hbck) is a tool for checking for region consistency and table integrity problems and repairing a corrupted HBase. It works in two basic modes – a read-only inconsistency identifying mode and a multi-phase read-write repair mode.

21. What is REST?

Rest stands for Representational State Transfer which defines the semantics so that the protocol can be used in a generic way to address remote resources. It also provides support for different message formats, offering many choices for a client application to communicate with the server.

22. What is Thrift?

Apache Thrift is written in C++, but provides schema compilers for many programming languages, including Java, C++, Perl, PHP, Python, Ruby, and more.

23. What is Nagios?

Nagios is a very commonly used support tool for gaining qualitative data regarding cluster status. It polls current metrics on a regular basis and compares them with given thresholds.

24. What is the use of ZooKeeper?

The ZooKeeper is used to maintain the configuration information and communication between region servers and clients. It also provides distributed synchronization. It helps in maintaining server state inside the cluster by communicating through sessions.

Every Region Server along with HMaster Server sends continuous heartbeat at regular interval to Zookeeper and it checks which server is alive and available. It also provides server failure notifications so that, recovery measures can be executed.

25. Define catalog tables in HBase?

Catalog tables are used to maintain the metadata information.

26. Define compaction in HBase?

HBase combines HFiles to reduce the storage and reduce the number of disk seeks needed for a read. This process is called compaction. Compaction chooses some HFiles from a region and combines them. There are two types of compactions.

27. What is the use of HColumnDescriptor class?

HColumnDescriptor stores the information about a column family like compression settings, number of versions etc. It is used as input when creating a table or adding a column.

28. Which filter accepts the pagesize as the parameter in hBase?

PageFilter accepts the pagesize as the parameter. Implementation of Filter interface that limits results to a specific page size. It terminates scanning once the number of filter-passed the rows greater than the given page size.

Syntax: PageFilter (<page_size>)

29. How will you design or modify schema in HBase programmatically?

HBase schemas can be created or updated using the Apache HBase Shell or by using Admin in the Java API.

Creating table schema:

Configuration config = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf); // execute command through admin</span></pre>

// Instantiating table descriptor class
HTableDescriptor t1 = new HTableDescriptor(TableName.valueOf("employee"));

// Adding column families to t1
t1.addFamily(new HColumnDescriptor("professional"));
t1.addFamily(new HColumnDescriptor("personal"));

// Create the table through admin
admin.createTable(t1);

♣ Tip: Tables must be disabled when making ColumnFamily modifications.

For modification:

String table = “myTable”;
admin.disableTable(table); 
admin.modifyColumn(table, cf2); // modifying existing ColumnFamily 
admin.enableTable(table);

30.What are the filters are available in Apache HBase?

The filters that are supported by HBase are:

31. How do we back up a HBase cluster?

There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster. Each approach has benefits and limitation.

Full Shutdown Backup

Some environments can tolerate a periodic full shutdown of their HBase cluster, for example, if it is being used as a back-end process and not serving front-end webpages.

Live Cluster Backup

The environments which cannot handle downtime uses Live Cluster Backup.

32. How HBase Handles the write failure?

Failures are common in large distributed systems, and HBase is no exception.

If the server hosting a MemStore that has not yet been flushed crashes. The data that was in memory, but not yet persisted are lost. HBase safeguards against that by writing to the WAL before the write completes. Every server that’s part of the.

HBase cluster keeps a WAL to record changes as they happen. The WAL is a file on the underlying file system. A write isn’t considered successful until the new WAL entry is successfully written. This guarantee makes HBase as durable as the file system backing it. Most of the time, HBase is backed by the Hadoop Distributed Filesystem (HDFS). If HBase goes down, the data that were not yet flushed from the MemStore to the HFile can be recovered by replaying the WAL.

33. While reading data from HBase, from which three places data will be reconciled before returning the value?

The read process will go through the following process sequentially:

34. Can you explain data versioning?

In addition to being a schema-less database, HBase is also versioned.

Every time you perform an operation on a cell, HBase implicitly stores a new version. Creating, modifying and deleting a cell are all treated identically, they are all new versions. When a cell exceeds the maximum number of versions, the extra records are dropped during the major compaction.

Instead of deleting an entire cell, you can operate on a specific version within that cell. Values within a cell are versioned and it is identified the timestamp. If a version is not mentioned, then the current timestamp is used to retrieve the version. The default number of cell version is three.

35. What is a Bloom filter and how does it help in searching rows?

HBase supports Bloom Filter to improve the overall throughput of the cluster. A HBase Bloom Filter  is a space efficient mechanism to test whether a HFile contains a specific row or row-col cell.

Without Bloom Filter, the only way to decide if a row key is present in a HFile  is to check the HFile’s block index, which stores the start row key of each block in the HFile. There are many rows drops between the two start keys. So, HBase has to load the block and scan the block’s keys to figure out if that row key actually exists.

Conclusion:

I hope these Apache HBase Interview Questions were helpful for you. This is just a part of our Hadoop Interview Question series. Kindly, refer to the links given below and enjoy the reading:

Got a question for us? Mention them in the comments section and we will get back to you.

BROWSE COURSES