Unable to run select query with selected columns on a temp view registered in spark application

Question

HI,

I am using hive jdbc to load data from hive to my spark application.

Dataset<Row> dataset = spark.read()
    .format("jdbc")
    .option("url", "jdbc:hive2://<url with serviceDiscovery=zookeeper")
    .option("user","<user_name>")
        .option("dbtable", "((select * from schema.table_name limit 30) tbl)")
        .option("fetchsize","30")
    .load();

Note: schema.table name consists column with name col1,col2 and col3.

dataset.show()

It gives me dataframe with column names as tbl.col1,tbl.col2 and tbl.col3

Then I register a temp view of it.

dataset.createOrReplaceGlobalTempView("myTempTable");

Then I run my custom sql on this myTempTable

Dataset<Row> myNewDataset = dataset.sqlContext().sql(select tbl.col1 from global_temp.myTempTable")

But it throws an error :

org.apache.spark.sql.AnalysisException: cannot resolve '`tbl.col1`' given input columns: [mytemptable.tbl.col1, mytemptable.tbl.col2, mytemptable.tbl.col3]; line 1 pos 7;
'Project ['tbl.col1]
+- SubqueryAlias mytemptable
   +- Relation[tbl.col1#0,tbl.col2#1,tbl.col3#2] JDBCRelation(((select * from schema.table_name limit 30) tbl)) [numPartitions=1]

Note: This command works completely fine and give me same results as dataset.show()

Dataset<Row> myNewDataset = dataset.sqlContext().sql(select * from global_temp.myTempTable")

Please help me how can I run a select query on temp view with selective columns. That too says column names with two "." in between. I have tried using query like "select myTempTable.tbl.col1 from myTempTable" still it doesn't works.

I don't know it will work or not. Just try with dataset.tb1.col1 instead of tb1.col1. — MD, Mar 26, 2020

GAURAV · Answer 1 · Mar 29, 2020

from pyspark.sql.types import FloatType

fname = [1.0,2.4,3.6,4.2,45.4]

df=spark.createDataFrame(fname, FloatType())

df.show()

+-----+
|value|
+-----+
|  1.0|
|  2.4|
|  3.6|
|  4.2|
| 45.4|
+-----+

df.registerTempTable("my_test_tbl")

df_res=spark.sql("select * from my_test_tbl")

df_res.show()

+-----+
|value|
+-----+
|  1.0|
|  2.4|
|  3.6|
|  4.2|
| 45.4|
+-----+

df_res=spark.sql("select value from my_test_tbl")

df_res.show()

+-----+
|value|
+-----+
|  1.0|
|  2.4|
|  3.6|
|  4.2|
| 45.4|
+-----+

answered Mar 29, 2020 by GAURAV
• 140 points

I was going through your code. How is it related to the query given above. Can you explain?

commented Mar 30, 2020 by MD
• 95,460 points

Unable to run select query with selected columns on a temp view registered in spark application

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Apache Spark

How to restrict a group to only view in Spark?

Unable to submit the spark job in deployment mode - multinode cluster(using ubuntu machines) with yarn master

How to index one csv file with no header , after converting the csv to a dataframe, i need to name the columns in order to normalize in minmaxScaler.

In AWS, if user wants to run spark, then on top of which one of the following can the user do it?

Efficient way to read specific columns from parquet file in spark

Is it possible to run Spark and Mesos along with Hadoop?

How to find the number of elements present in the array in a Spark DataFame column?

Filtering a row in Spark DataFrame based on matching values from a list

Is it mandatory to start Hadoop to run spark application?

How to create new column with function in Spark Dataframe?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES