Introduction to Apache Hive

Comprehensive HIVE (4 Blogs) Become a Certified Professional

Apache Hive is a Data Warehousing package built on top of Hadoop and is used for data analysis. Hive is targeted towards users who are comfortable with SQL. It is similar to SQL and called HiveQL, used for managing and querying structured data. Apache Hive is used to abstract complexity of Hadoop. This language also allows traditional map/reduce programmers to plug in their custom mappers and reducers. The popular feature of Hive is that there is no need to learn Java.

Hive, an open source peta-byte scale date warehousing framework based on Hadoop, was developed by the Data Infrastructure Team at Facebook. Hive is also one of the technologies that are being used to address the requirements at Facebook. Hive is very popular with all the users internally at Facebook and is being used to run thousands of jobs on the cluster with hundreds of users, for a wide variety of applications. Hive-Hadoop cluster at Facebook stores more than 2PB of raw data and regularly loads 15 TB of data on a daily basis.

Let’s look at some of its features that makes it popular and user friendly:

Allows programmers to plug in custom Mappers and Reducers.
Has Data Warehouse infrastructure.
Provides tools to enable easy data ETL.
Defines SQL-like query language called QL.

Apache Hive Use Case – Facebook:

Before implementing Hive, Facebook faced a lot of challenges as the size of data being generated increased or rather exploded, making it really difficult to handle them. The traditional RDBMS couldn’t handle the pressure and as a result Facebook was looking out for better options. To solve this impending issue, Facebook initially tried using Hadoop MapReduce, but with difficulty in programming and mandatory knowledge in SQL, made it an impractical solution. Hive allowed them to overcome the challenges they were facing.

With Hive, they are now able to perform the following:

Tables can be portioned and bucketed
Schema flexibility and evolution
JDBC/ODBC drivers are available
Hive tables can be defined directly in the HDFS
Extensible – Types, Formats, Functions and scripts

Hive Use Case in Healthcare:

Where to Use Hive?

Apache Hive can be used in the following places:

Data Mining
Log Processing
Document Indexing
Customer Facing Business Intelligence
Predictive Modelling
Hypothesis Testing

Hive Architecture:

Hive consists of the following major components:

Metastore – To store the metadata.
JDBC/ODBC – Query Compiler and Execution Engine to convert SQL queries to a sequence of MapReduce.
SerDe and ObjectInspectors – For data formats and types.
UDF/UDAF – For User Defined Functions.
Clients – Similar to MySQL command line and a web UI.

Become a master of data architecture and shape the future with our comprehensive Data Architect Certification.

Components of Hive:

Metastore:

The Metastore stores the information about the tables, partitions, the columns within the tables. There are 3 ways of storing in Metastore: Embedded Metastore, Local Metastore and Remote Metastore. Mostly, Remote Metastore will be used in production mode.

Limitations of Hive:

Hive has the following limitations and cannot be used under such circumstances:

Not designed for online transaction processing.
Provides acceptable latency for interactive data browsing.
Does not offer real-time queries and row level updates.
Latency for Hive queries is generally very high.

Got a question for us? Mention them in the comments section and we will get back to you.

Related Posts:

Big Data and Hadoop Training

Introduction to Apache Hive

Apache Hive Use Case – Facebook:

Hive Use Case in Healthcare:

Where to Use Hive?

Hive Architecture:

Components of Hive:

Limitations of Hive:

Recommended videos for you

Hadoop Tutorial – A Complete Tutorial For Hadoop

Filtering on HBase Using MapReduce Filtering Pattern

Is Hadoop A Necessity For Data Science?

MapReduce Design Patterns – Application of Join Pattern

Hadoop Cluster With High Availability

Hadoop Architecture – Hadoop Tutorial on HDFS Architecture

5 Scenarios: When To Use & When Not to Use Hadoop

Bulk Loading Into HBase With MapReduce

When not to use Hadoop

Webinar: Introduction to Big Data & Hadoop

Apache Kafka With Spark Streaming: Real-Time Analytics Redefined

Big Data – XML Parsing With MapReduce

Logistic Regression In Data Science

Apache Spark For Faster Batch Processing

Advanced Security In Hadoop Cluster

Apache Spark Will Replace Hadoop ! Know Why

Secure Your Hadoop Cluster With Kerberos

Apache Spark Redefining Big Data Processing

Hadoop for Java Professionals

MapReduce Tutorial – All You Need To Know About MapReduce

Recommended blogs for you

Jupyter Notebook Cheat Sheet : A Beginner’s Guide to Jupyter Notebook

Top 10 Reasons to Learn Hadoop

Spark MLlib – Machine Learning Library Of Apache Spark

Game Changing Big Data Use Cases

Hive & Yarn Get Electrified By Spark

Implementing Hadoop & R Analytic Skills in Banking Domain

Splunk Lookup and Fields: Splunk Knowledge Objects

Basics of HBase

Hadoop Ecosystem: Hadoop Tools for Crunching Big Data

Hadoop Cluster : The all you need to know Guide

Hadoop and Java Job Trends

Top Apache Kafka Interview Questions To Prepare In 2025

Machine Learning and Big Data: Is it the future?

Big Data In Healthcare: How Hadoop Is Revolutionizing Healthcare Analytics

Setting Up A Multi Node Cluster In Hadoop 2.X

Big Data Analytics: Turning Insights into Action

Install Apache Hadoop Cluster on Amazon EC2 free tier Ubuntu server in 30 minutes

Top 14 Big Data Certifications in 2021

Big Data Testing: A Perfect Guide You Need to Follow

Introduction to Apache Hive

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric DP-700 Certification Trainin ...

PySpark Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Apache Kafka Certification Training Course

ELK Stack Training & Certification

Apache Spark and Scala Certification Training ...

Splunk Certification Training: Power User and ...

Comprehensive MapReduce Certification Trainin ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Introduction to Apache Hive