Why does sortBy transformation trigger a Spark job

As far as I know only spark Actions can trigger a spark job, not any transformation as Spark follows lazy evaluation.

So when I execute the sortBy command, the spark web UI shows it as a spark job trigger.

Any idea why it's happening?

May 8, 2018 in Apache Spark by shams
• 3,670 points • 3,005 views

1 answer to this question.

Actually, sortBy/sortByKey depends on RangePartitioner (JVM). So when you run a sortBy/sortByKey, the partitioner is initialized which samples input RDD to compute partition boundaries. The actual sorting will happen only when an action is invoked.

The job in Web UI reflects this process.

answered May 8, 2018 by kurt_cobain
• 9,350 points

Related Questions In Apache Spark

0 votes

1 answer

What is Spark UI and how to monitor a spark job?

Hey, Jobs- to view all the spark jobs Stages- ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 65,730 points • 3,685 views

0 votes

1 answer

Why is Spark faster than Hadoop Map Reduce

Firstly, it's the In-memory computation, if the file ...READ MORE

answered Apr 30, 2018 in Apache Spark by shams
• 3,670 points • 2,225 views

0 votes

2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Dhara dhruve
• 7,405 views

+1 vote

1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points • 9,669 views

+1 vote

1 answer

I installed Spark but while executing command, I am getting ‘hadoop’ command not found error?

For accessing Hadoop commands & HDFS, you ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by Shubham
• 13,490 points • 3,755 views

0 votes

3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7, 2019 in Big Data Hadoop by pradeep
• 3,757 views

0 votes

1 answer

What is the benefit of using CDH over other Distributors?

CDH is basically a packaged deal, where ...READ MORE

answered Mar 29, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points • 1,409 views

0 votes

1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points • 18,583 views

0 votes

1 answer

Can I read a CSV represented as a string into Apache Spark?

You can use the following command. This ...READ MORE

answered May 3, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 3,365 views

+1 vote

2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 7,718 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP