Which query to use for better performance, join in SQL or using Dataset API?

0 votes
I'm a bit curious, when i'm using data from Hbase and doing analysis using Spark , which one is faster? Spark SQL join or Dataframe Join ?
Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
159 views

1 answer to this question.

0 votes

DataFrames and SparkSQL performed almost about the same, although with analysis involving aggregation and sorting SparkSQL had a slight advantage.

Hope this helps

answered Apr 19, 2018 by kurt_cobain
• 9,290 points

Related Questions In Apache Spark

0 votes
1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

answered Jun 25, 2018 in Apache Spark by nitinrawat895
• 10,840 points
57 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,840 points
1,812 views
0 votes
1 answer

How to get SQL configuration in Spark using Python?

You can get the configuration details through ...READ MORE

answered Mar 18, 2019 in Apache Spark by John
112 views
0 votes
1 answer

How to use ftp scheme using Yarn in Spark application?

In case Yarn does not support schemes ...READ MORE

answered Mar 28, 2019 in Apache Spark by Raj
168 views
0 votes
1 answer

How to merge data frames using joins?

You can use the merge function with ...READ MORE

answered Apr 12, 2018 in Data Analytics by kappa3010
• 2,070 points
79 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
5,557 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21, 2019 in Apache Spark by anonymous
38,858 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

answered Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,370 points
2,845 views
0 votes
1 answer

reduceByKey or reduceByKeyLocally , which should be preferred ?

Yes, they both merge the values using ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,290 points
691 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,290 points
1,652 views