Which query to use for better performance join in SQL or using Dataset API

I'm a bit curious, when i'm using data from Hbase and doing analysis using Spark , which one is faster? Spark SQL join or Dataframe Join ?

Apr 19, 2018 in Apache Spark by Ashish
• 2,650 points • 2,879 views

1 answer to this question.

DataFrames and SparkSQL performed almost about the same, although with analysis involving aggregation and sorting SparkSQL had a slight advantage.

Hope this helps

answered Apr 19, 2018 by kurt_cobain
• 9,350 points

Related Questions In Apache Spark

0 votes

1 answer

3)You have a dataset of in-game purcahses from mobile game users and you want to group these users for upsell. which one of the spark machine learning algorithms could you use ?

linear regression READ MORE

answered Jan 31, 2024 in Apache Spark by b

edited Mar 5, 2025 • 5,311 views

0 votes

1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

answered Jun 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 1,607 views

0 votes

1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points
edited Nov 19, 2021 by Sarfaraz • 9,672 views

0 votes

1 answer

How to get SQL configuration in Spark using Python?

You can get the configuration details through ...READ MORE

answered Mar 18, 2019 in Apache Spark by John
• 2,007 views

0 votes

1 answer

How to merge data frames using joins?

You can use the merge function with ...READ MORE

answered Apr 12, 2018 in Data Analytics by kappa3010
• 2,090 points • 1,803 views

0 votes

1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,650 points • 14,926 views

+5 votes

11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21, 2019 in Apache Spark by anonymous
• 75,838 views

0 votes

1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

answered Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 8,337 views

0 votes

1 answer

reduceByKey or reduceByKeyLocally , which should be preferred ?

Yes, they both merge the values using ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 3,378 views

0 votes

1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 9,048 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP