How to connect Spark to a remote Hive server

Question

Can anyone help me in understanding how I can load data from Hive server which is installed remotely into Spark data frame. Do I need a hive jdbc connector?

Shubham · Answer 1 · May 17, 2018

Use org.apache.spark.sql.hive.HiveContext & you can perform query on Hive.

But I would suggest you to connect Spark to HDFS & perform analytics over the stored data. It would be much more efficient that connecting Spark with Hive and then performing analysis over it.

answered May 17, 2018 by Shubham
• 13,490 points

Vijay Dixon · Answer 2 · Mar 8, 2019

JDBC is not required here.

Create a hive SQLContext as below , this works for me

val conf = new org.apache.spark.SparkConf().setAppName("hive app")

val sc = new org.apache.spark.SparkContext(conf)

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

val df1 = sqlContext.sql(s"use $dbname");

val dfUnion1 = sqlContext.sql(s"Select * from table_name");

answered Mar 8, 2019 by Vijay Dixon
• 190 points

Gitika · Answer 3 · Jul 30, 2019

Hi,

JDBC is not required.

HiveServer2 has a JDBC driver. It supports both embedded and remote access to HiveServer2. Remote HiveServer2 mode is recommended for production use, as it is more secure and doesn't require direct HDFS/metastore access to be granted for users.

Put hive-site.xml on your classpath, and specify hive.metastore.uris to where your hive metastore hosted.
Import org.apache.spark.sql.hive.HiveContext, as it can perform SQL query over Hive tables.
Define val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc).
Verify sqlContext.sql("show tables") to see if it works.