Spark Vs Hive LLAP Question

0 votes
I have done lot of research on Hive and Spark SQL. I still don't understand why spark SQL is needed to build applications where hive does everything using execution engines like Tez, Spark, and LLAP. Note: LLAP is much more faster than any other execution engines.

Spark SQL connects hive using Hive Context and does not support any transactions.

Hive does all the transactions over Spark SQL.
Jul 16 in Big Data Hadoop by Vishnu
576 views

1 answer to this question.

0 votes

While Apache Hive and Spark SQL perform the same action, retrieving data, each does the task in a different way. However, Hive is planned as an interface or convenience for querying data stored in HDFS. Though, MySQL is planned for online operations requiring many reads and writes.

Apache Hive:
Apache Hive is built on top of Hadoop. Moreover, It is an open source data warehouse system. Also, helps for analyzing and querying large datasets stored in Hadoop files. First, we have to write complex Map-Reduce jobs. But, using Hive, we just need to submit merely SQL queries. Users who are comfortable with SQL, Hive is mainly targeted towards them.

Spark SQL:
In Spark, we use Spark SQL for structured data processing. Moreover, we get more information on the structure of data by using SQL. Also, gives information on computations performed. One can achieve extra optimization in Apache Spark, with this extra information. Although, Interaction with Spark SQL is possible in several ways. Such as DataFrame and the Dataset API.​

Usage
Apache Hive:

  • Schema flexibility and evolution.
  • Also, can portion and bucket, tables in Apache Hive.
  • As JDBC/ODBC drivers are available in Hive, we can use it.​

Spark SQL:

  • Basically, it performs SQL queries.
  • Through Spark SQL, it is possible to read data from existing Hive installation.
  • We get the result as Dataset/DataFrame if we run Spark SQL with another programming language.​

Limitations
Apache Hive:

  • It does not offer real-time queries and row level updates.
  • Also provides acceptable latency for interactive data browsing.
  • Hive does not support online transaction processing.
  • In Apache Hive, the latency for queries is generally very high.​

Spark SQL:

  • It does not support union type
  • Although, no provision of error for oversize of varchar type
  • It does not support transactional table
  • However, no support for Char type
  • It does not support time-stamp in Avro table.​

Conclusion​
Hence, we can not say SparkSQL is not a replacement for Hive neither is the other way. As a result, we have seen that SparkSQL is more spark API and developer friendly. Also, SQL makes programming in spark easier. While, Hive’s ability to switch execution engines, is efficient to query huge data sets. Although, we can just say it’s usage is totally depends on our goals. Apart from it, we have discussed we have discussed Usage as well as limitations above.

answered Jul 16 by Karan

Related Questions In Big Data Hadoop

0 votes
3 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8 in Big Data Hadoop by Vijay Dixon
• 190 points
1,615 views
0 votes
1 answer

Bucketing vs Partitioning in HIve

Partition divides large amount of data into ...READ MORE

answered Jul 9, 2018 in Big Data Hadoop by Data_Nerd
• 2,360 points
3,731 views
0 votes
2 answers

Which of these will vanish: Flink vs Spark?

At first glance, Flink and Spark would ...READ MORE

answered Aug 13, 2018 in Big Data Hadoop by kurt_cobain
• 9,280 points
100 views
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 68,180 points
2,136 views
+1 vote
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
3,531 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
433 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
17,963 views
–1 vote
1 answer

How we can run spark SQL over hive tables in our cluster?

Open spark-shell. scala> import org.apache.spark.sql.hive._ scala> val hc = ...READ MORE

answered Dec 26, 2018 in Big Data Hadoop by Omkar
• 68,180 points
82 views
0 votes
1 answer