Comprehensive HIVE (4 Blogs) Become a Certified Professional

Hive Data Models

Last updated on May 22,2019 33.7K Views

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive structures data into well-understood database concepts such as tables, rows, columns and partitions. It supports primitive types like Integers, Floats, Doubles, and Strings. Hive also supports Associative Arrays, Lists, Structs, and Serialize and Deserialized API is used to move data in and out of tables.

Let’s look at Hive Data Models in detail;

Hive Data Models:

The Hive data models contain the following components:

  • Databases
  • Tables
  • Partitions
  • Buckets or clusters


Partition means dividing a table into a coarse grained parts based on the value of a partition column such as ‘data’. This makes it faster to do queries on slices of data

Hive Data Models

So, what is the function of Partition? The Partition keys determine how data is stored. Here, each unique value of the Partition key defines a Partition of the table. The Partitions are named after dates for convenience. It is similar to ‘Block Splitting’ in HDFS.


Buckets give extra structure to the data that may be used for efficient queries. A join of two tables that are bucketed on the same columns, including the join column can be implemented as a Map-Side Join. Bucketing by used ID means we can quickly evaluate a user-based query by running it on a randomized sample of the total set of users.

Hive Data Models

Got a question for us? Please mention them in the comments section and we will get back to you. 

Related Posts:

Big Data and Hadoop Training

Helpful Hive Commands

Create your First Hive Script. 


Browse Categories

webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP

Subscribe to our Newsletter, and get personalized recommendations.