Which query to use for better performance, join in SQL or using Dataset API?

0 votes
I'm a bit curious, when i'm using data from Hbase and doing analysis using Spark , which one is faster? Spark SQL join or Dataframe Join ?
Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
107 views

1 answer to this question.

0 votes

DataFrames and SparkSQL performed almost about the same, although with analysis involving aggregation and sorting SparkSQL had a slight advantage.

Hope this helps

answered Apr 19, 2018 by kurt_cobain
• 9,240 points

Related Questions In Apache Spark

0 votes
1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

answered Jun 25, 2018 in Apache Spark by nitinrawat895
• 10,670 points
33 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,670 points
1,216 views
0 votes
1 answer

How to get SQL configuration in Spark using Python?

You can get the configuration details through ...READ MORE

answered Mar 18 in Apache Spark by John
53 views
0 votes
1 answer

How to use ftp scheme using Yarn in Spark application?

In case Yarn does not support schemes ...READ MORE

answered Mar 28 in Apache Spark by Raj
112 views
0 votes
1 answer

How to merge data frames using joins?

You can use the merge function with ...READ MORE

answered Apr 12, 2018 in Data Analytics by kappa3010
• 2,020 points
64 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
4,191 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21 in Apache Spark by anonymous
28,037 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

answered Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,360 points
2,136 views
0 votes
1 answer

reduceByKey or reduceByKeyLocally , which should be preferred ?

Yes, they both merge the values using ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,240 points
500 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,211 views