How to find max value in pair RDD?

0 votes
I have a spark pair RDD (key, count) as below:

Array[(String, Int)] = Array((a,1), (b,2), (c,1), (d,3))
How to find the key with the highest count using spark?

Note: datatype of pair RDD is org.apache.spark.rdd.RDD[(String, Int)]
May 25, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,938 views

1 answer to this question.

0 votes

Use Array.maxBy method:

val a = Array(("a",1), ("b",2), ("c",1), ("d",3))
val maxKey = a.maxBy(_._2)
// maxKey: (String, Int) = (d,3)
or RDD.max:

val maxKey2 = rdd.max()(new Ordering[Tuple2[String, Int]]() {
  override def compare(x: (String, Int), y: (String, Int)): Int = 
      Ordering[Int].compare(x._2, y._2)
})

answered May 25, 2018 by nitinrawat895
• 10,030 points

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3 in Apache Spark by Omkar
• 67,120 points
91 views
0 votes
1 answer

How to create RDD from parallelized collection in scala?

Hi, You can check this example in your ...READ MORE

answered Jul 3 in Apache Spark by Gitika
• 19,720 points
19 views
0 votes
0 answers

How to create RDD from existing RDD in scala?

Can anyone suggest how to create RDD ...READ MORE

Jul 3 in Apache Spark by Nihal
7 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
4,299 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,030 points
810 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 10,030 points
4,089 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,030 points
1,017 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
6,955 views