How to find max value in pair RDD?

0 votes
I have a spark pair RDD (key, count) as below:

Array[(String, Int)] = Array((a,1), (b,2), (c,1), (d,3))
How to find the key with the highest count using spark?

Note: datatype of pair RDD is org.apache.spark.rdd.RDD[(String, Int)]
May 25, 2018 in Apache Spark by kurt_cobain
• 9,310 points
3,756 views

1 answer to this question.

0 votes

Use Array.maxBy method:

val a = Array(("a",1), ("b",2), ("c",1), ("d",3))
val maxKey = a.maxBy(_._2)
// maxKey: (String, Int) = (d,3)
or RDD.max:

val maxKey2 = rdd.max()(new Ordering[Tuple2[String, Int]]() {
  override def compare(x: (String, Int), y: (String, Int)): Int = 
      Ordering[Int].compare(x._2, y._2)
})

answered May 25, 2018 by nitinrawat895
• 10,920 points

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3, 2019 in Apache Spark by Omkar
• 69,060 points
553 views
0 votes
1 answer

How to create RDD from parallelized collection in scala?

Hi, You can check this example in your ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 29,170 points
301 views
0 votes
1 answer

How to create RDD from existing RDD in scala?

scala> val rdd1 = sc.parallelize(List(1,2,3,4,5))                           -  Creating ...READ MORE

answered Feb 28 in Apache Spark by anonymous
119 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by kurt_cobain
• 9,310 points
9,312 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,920 points
2,690 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 10,920 points
13,449 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,920 points
2,266 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

answered Dec 10, 2018 in Apache Spark by Akshay
25,417 views