Comprehensive HIVE (4 Blogs) Become a Certified Professional

Introduction to Apache Hive

Last updated on Jun 05,2023 12K Views


Apache Hive is a Data Warehousing package built on top of Hadoop and is used for data analysis. Hive is targeted towards users who are comfortable with SQL. It is similar to SQL and called HiveQL, used for managing and querying structured data. Apache Hive is used to abstract complexity of Hadoop. This language also allows traditional map/reduce programmers to plug in their custom mappers and reducers. The popular feature of Hive is that there is no need to learn Java.

Hive, an open source peta-byte scale date warehousing framework based on Hadoop, was developed by the Data Infrastructure Team at Facebook. Hive is also one of the technologies that are being used to address the requirements at Facebook. Hive is  very popular with all the users internally at Facebook and is being used to run thousands of jobs on the cluster with hundreds of users, for a wide variety of applications. Hive-Hadoop cluster at Facebook stores more than 2PB of raw data and regularly loads 15 TB of data on a daily basis.

Let’s look at some of its features that makes it popular and user friendly:

  • Allows programmers to plug in custom Mappers and Reducers.
  • Has Data Warehouse infrastructure.
  • Provides tools to enable easy data ETL.
  • Defines SQL-like query language called QL.

Apache Hive Use Case – Facebook: 

Hive Use Case – Facebook

Before implementing Hive, Facebook faced a lot of challenges as the size of data being generated increased or rather exploded, making it really difficult to handle them. The traditional RDBMS couldn’t handle the pressure and as a result Facebook was looking out for better options. To solve this impending issue, Facebook initially tried using Hadoop MapReduce, but with difficulty in programming and mandatory knowledge in SQL, made it an impractical solution. Hive allowed them to overcome the challenges they were facing.

With Hive, they are now able to perform the following:

  • Tables can be portioned and bucketed
  • Schema flexibility and evolution
  • JDBC/ODBC drivers are available
  • Hive tables can be defined directly in the HDFS
  • Extensible – Types, Formats, Functions and scripts

Hive Use Case in Healthcare:

Hive Use Case in Healthcare

Where to Use Hive?

Where to Use Hive

Apache Hive can be used in the following places:

  • Data Mining
  • Log Processing
  • Document Indexing
  • Customer Facing Business Intelligence
  • Predictive Modelling
  • Hypothesis Testing

Hive Architecture:

Hive Architecture

Hive consists of the following major components:

  • Metastore – To store the metadata.
  • JDBC/ODBC – Query Compiler and Execution Engine to convert SQL queries to a sequence of MapReduce.
  • SerDe and ObjectInspectors – For data formats and types.
  • UDF/UDAF – For User Defined Functions.
  • Clients – Similar to MySQL  command line and a web UI.

Become a master of data architecture and shape the future with our comprehensive Data Architect Certification.

Components of Hive:

Components of Hive

Metastore:

The Metastore stores the information about the tables, partitions, the columns within the tables. There are 3 ways of storing in Metastore: Embedded Metastore, Local Metastore and Remote Metastore. Mostly, Remote Metastore will be used in production mode.

Metastore

 Limitations of Hive:

Limitations of Hive

Hive has the following limitations and cannot be used  under such circumstances:

  • Not designed for online transaction processing.
  • Provides acceptable latency for interactive data browsing.
  • Does not offer real-time queries and row level updates.
  • Latency for Hive queries is generally very high.

Got a question for us? Mention them in the comments section and we will get back to you.

Related Posts:

Big Data and Hadoop Training

Hive Commands

Pig Vs Hive

Comments
2 Comments
  • Hareesh@Disqus says:

    Hi Team

    I have question regarding Hive metastore.
    the default metastore is apache derby. In production cluster environment derby is not used.
    MyQuestion:
    If i go for mysql as metastore. where mysql Installed in cluster (NameNode or DataNode)?

    • EdurekaSupport says:

      Hi Hareesha,
      Thank you for reaching out to us.
      You will have 24/7 access to our support team as well as instructor assistance for resolving all your queries once you enroll for our course.
      You can click here to enroll: https://www.edureka.co/big-data-hadoop-training-certification
      You can get in touch with us for further clarification about opportunities in Hadoop by contacting our sales team on +91-8880862004 (India) or 1800 275 9730 (US toll free). You can also mail us on sales@edureka.co.

Join the discussion

Browse Categories

webinar REGISTER FOR FREE WEBINAR
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP

Subscribe to our Newsletter, and get personalized recommendations.

image not found!
image not found!

Introduction to Apache Hive

edureka.co