How SortBykey() operation works in Spark?

0 votes
Can anyone explain how SortBykey() operation works in Spark?
Aug 2 in Apache Spark by Rashi
19 views

1 answer to this question.

0 votes

Hey,

sortByKey() is a transformation.

  • It returns an RDD sorted by Key.
  • Sorting can be done in (1) Ascending OR (2) Descending OR (3) custom sorting

They will work with any key type K that has an implicit Ordering[K] in scope. Ordering objects already exist for all of the standard primitive types. Users can also define their own orderings for custom types, or to override the default ordering. The implicit ordering that is in the closest scope will be used.

When called on Dataset of (K, V) where k is Ordered returns a dataset of (K, V) pairs sorted by keys in ascending or descending order, as specified in the ascending argument.

Here is an example:

<br /> 

val rdd1 = sc.parallelize(Seq(("India",91),("USA",1),("Brazil",55),("Greece",30),("China",86),("Sweden",46),("Turkey",90),("Nepal",977)))

<br />

 val rdd2 = rdd1.sortByKey()<br /> rdd2.collect();<br />

Output:
Array[(String,Int)] = (Array(Brazil,55),(China,86),(Greece,30),(India,91),(Nepal,977),(Sweden,46),(Turkey,90),(USA,1)

answered Aug 2 by Gitika
• 25,300 points

Related Questions In Apache Spark

0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4 in Apache Spark by Dhara dhruve
673 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
1,002 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
11,948 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,490 points
1,169 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,490 points
2,314 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,490 points
237 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
11,915 views
0 votes
1 answer

How Foreach Operation works in Apache Spark?

Hi, foreach() operation is an action. It does not ...READ MORE

answered Aug 2 in Apache Spark by Gitika
• 25,300 points
18 views
0 votes
1 answer

How to run spark in Standalone client mode?

Hi, These are the steps to run spark in ...READ MORE

answered Jul 5 in Apache Spark by Gitika
• 25,300 points
27 views