What is Hive Is Hive a database

Question

I am new to Hive. I found it similar to RDBMS like tables, joins, partitions. According to my understanding Hive uses HDFS for storing data and it provides SQL abstraction over HDFS. Is Hive a database over HDFS like HBase, or is it a querying tool over HDFS.

But I doubt that Hive is a query language, as it has tables, joins & partitions.

nitinrawat895 · Answer 1 · Mar 16, 2018

No, we cannot call Apache Hive a relational database, as it is a data warehouse which is built on top of Apache Hadoop for providing data summarization, query and, analysis. It differs from a relational database in a way that it stores schema in a database and processed data into HDFS.

For processing, Hive provides a SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. It supports queries expressed in a language called HiveQL, which automatically translates SQL-like queries into MapReduce jobs executed on Hadoop.

Hive is read-based and therefore not support transaction processing that typically involves a high percentage of write operations. It is best suited for batch jobs like weblog processing and is designed for OLAP workloads.

answered Mar 16, 2018 by nitinrawat895
• 11,380 points

Hi here you mentioned "stores schema in a database", what the database can be like SQL server etc..,?

commented May 2, 2019 by Sai Krishna Mamidi

Hi @Sai.

By default, the schema is stored in Derby. But it is possible to change it to MySql or PostgreSql.

commented May 2, 2019 by Omkar
• 69,180 points

Gitika · Answer 2 · May 8, 2019

Hey,

HIVE:- Hive is an ETL (extract, transform, load) and data warehouse tool developed on the top of the Hadoop Distributed File System. In Hive, tables and databases are created first and then the data is loaded into these tables. Hive as data warehouse is designed only for managing and querying only the structured data that is stored in the table.

The main difference in HiveQL and SQL is the hive query executes on Hadoop's infrastructure rather than the traditional database. The Hive query execution is like a series of automatically generated Map Reduce jobs

By using Hive, we can achieve some peculiar functionality that is not achieved in the relational database. For a huge amount of data that is in peta-bytes, querying it and getting results in seconds is important, and hive does is quite efficient, it processes the query fast and produce results in seconds.

answered May 8, 2019 by Gitika
• 65,730 points

score +1 · Answer 3 · Jul 1, 2019

Hive is a data Warehouse infrastructure/system built on top of Hadoop for querying and analyzing structured data residing in HDFS.

Hope this answers your question.