Unable to run select query with selected columns on a temp view registered in spark application

0 votes

HI,

I am using hive jdbc to load data from hive to my spark application. 

Dataset<Row> dataset = spark.read()
    .format("jdbc")
    .option("url", "jdbc:hive2://<url with serviceDiscovery=zookeeper")
    .option("user","<user_name>")
        .option("dbtable", "((select * from schema.table_name limit 30) tbl)")
        .option("fetchsize","30")
    .load();


Note: schema.table name consists column with name col1,col2 and col3.

dataset.show()

It gives me dataframe with column names as tbl.col1,tbl.col2 and tbl.col3

Then I register a temp view of it.

dataset.createOrReplaceGlobalTempView("myTempTable");

Then I run my custom sql on this myTempTable

Dataset<Row> myNewDataset = dataset.sqlContext().sql(select tbl.col1 from global_temp.myTempTable")

But it throws an error :
org.apache.spark.sql.AnalysisException: cannot resolve '`tbl.col1`' given input columns: [mytemptable.tbl.col1, mytemptable.tbl.col2, mytemptable.tbl.col3]; line 1 pos 7;
'Project ['tbl.col1]
+- SubqueryAlias mytemptable
   +- Relation[tbl.col1#0,tbl.col2#1,tbl.col3#2] JDBCRelation(((select * from schema.table_name limit 30) tbl)) [numPartitions=1]


Note: This command works completely fine and give me same results as dataset.show()

Dataset<Row> myNewDataset = dataset.sqlContext().sql(select * from global_temp.myTempTable")

Please help me how can I run a select query on temp view with selective columns. That too says column names with two "." in between. I have tried using query like "select myTempTable.tbl.col1 from myTempTable" still it doesn't works.


Mar 26 in Apache Spark by sid
• 120 points
433 views

I don't know it will work or not. Just try with dataset.tb1.col1 instead of tb1.col1.

It gives the same error. I have tried it,
Can you share schema of your dataset?

1 answer to this question.

0 votes
from pyspark.sql.types import FloatType 
fname = [1.0,2.4,3.6,4.2,45.4]
df=spark.createDataFrame(fname, FloatType())
df.show()
+-----+
|value|
+-----+
|  1.0|
|  2.4|
|  3.6|
|  4.2|
| 45.4|
+-----+

df.registerTempTable("my_test_tbl")
df_res=spark.sql("select * from my_test_tbl")
df_res.show()
+-----+
|value|
+-----+
|  1.0|
|  2.4|
|  3.6|
|  4.2|
| 45.4|
+-----+

df_res=spark.sql("select value from my_test_tbl")
df_res.show()
+-----+
|value|
+-----+
|  1.0|
|  2.4|
|  3.6|
|  4.2|
| 45.4|
+-----+
answered Mar 28 by GAURAV
• 140 points
I was going through your code. How is it related to the query given above. Can you explain?

Related Questions In Apache Spark

0 votes
1 answer

How to restrict a group to only view in Spark?

You can do it dynamically be setting ...READ MORE

answered Mar 15, 2019 in Apache Spark by Raj
99 views
0 votes
1 answer

Unable to submit the spark job in deployment mode - multinode cluster(using ubuntu machines) with yarn master

Hi@Ganendra, As you said you launched a multinode cluster, ...READ MORE

answered Jul 29 in Apache Spark by MD
• 57,060 points
110 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,320 points
2,913 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

answered May 29, 2018 in Apache Spark by Data_Nerd
• 2,390 points
175 views
0 votes
1 answer
0 votes
2 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3 ...READ MORE

answered Jun 5, 2018 in Apache Spark by Shubham
• 13,450 points
64,261 views
0 votes
1 answer

Is it mandatory to start Hadoop to run spark application?

No, it is not mandatory, but there ...READ MORE

answered Jun 14, 2018 in Apache Spark by nitinrawat895
• 10,950 points
158 views
0 votes
12 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 57,230 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
1,288 views