HI,
I am using hive jdbc to load data from hive to my spark application.
Dataset<Row> dataset = spark.read()
.format("jdbc")
.option("url", "jdbc:hive2://<url with serviceDiscovery=zookeeper")
.option("user","<user_name>")
.option("dbtable", "((select * from schema.table_name limit 30) tbl)")
.option("fetchsize","30")
.load();
Note: schema.table name consists column with name col1,col2 and col3.
dataset.show()
It gives me dataframe with column names as tbl.col1,tbl.col2 and tbl.col3
Then I register a temp view of it.
dataset.createOrReplaceGlobalTempView("myTempTable");
Then I run my custom sql on this myTempTable
Dataset<Row> myNewDataset = dataset.sqlContext().sql(select tbl.col1 from global_temp.myTempTable")
But it throws an error :
org.apache.spark.sql.AnalysisException: cannot resolve '`tbl.col1`' given input columns: [mytemptable.tbl.col1, mytemptable.tbl.col2, mytemptable.tbl.col3]; line 1 pos 7;
'Project ['tbl.col1]
+- SubqueryAlias mytemptable
+- Relation[tbl.col1#0,tbl.col2#1,tbl.col3#2] JDBCRelation(((select * from schema.table_name limit 30) tbl)) [numPartitions=1]
Note: This command works completely fine and give me same results as dataset.show()
Dataset<Row> myNewDataset = dataset.sqlContext().sql(select * from global_temp.myTempTable")
Please help me how can I run a select query on temp view with selective columns. That too says column names with two "." in between. I have tried using query like "select myTempTable.tbl.col1 from myTempTable" still it doesn't works.