Which query to use for better performance, join in SQL or using Dataset API?

0 votes
I'm a bit curious, when i'm using data from Hbase and doing analysis using Spark , which one is faster? Spark SQL join or Dataframe Join ?
Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
133 views

1 answer to this question.

0 votes

DataFrames and SparkSQL performed almost about the same, although with analysis involving aggregation and sorting SparkSQL had a slight advantage.

Hope this helps

answered Apr 19, 2018 by kurt_cobain
• 9,260 points

Related Questions In Apache Spark

0 votes
1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

answered Jun 25, 2018 in Apache Spark by nitinrawat895
• 10,730 points
43 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,730 points
1,511 views
0 votes
1 answer

How to get SQL configuration in Spark using Python?

You can get the configuration details through ...READ MORE

answered Mar 18 in Apache Spark by John
78 views
0 votes
1 answer

How to use ftp scheme using Yarn in Spark application?

In case Yarn does not support schemes ...READ MORE

answered Mar 28 in Apache Spark by Raj
140 views
0 votes
1 answer

How to merge data frames using joins?

You can use the merge function with ...READ MORE

answered Apr 12, 2018 in Data Analytics by kappa3010
• 2,020 points
74 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
4,966 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21 in Apache Spark by anonymous
34,078 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

answered Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,360 points
2,550 views
0 votes
1 answer

reduceByKey or reduceByKeyLocally , which should be preferred ?

Yes, they both merge the values using ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,260 points
626 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,260 points
1,431 views