How to find max value in pair RDD?

0 votes
I have a spark pair RDD (key, count) as below:

Array[(String, Int)] = Array((a,1), (b,2), (c,1), (d,3))
How to find the key with the highest count using spark?

Note: datatype of pair RDD is org.apache.spark.rdd.RDD[(String, Int)]
May 25, 2018 in Apache Spark by kurt_cobain
• 9,310 points
3,240 views

1 answer to this question.

0 votes

Use Array.maxBy method:

val a = Array(("a",1), ("b",2), ("c",1), ("d",3))
val maxKey = a.maxBy(_._2)
// maxKey: (String, Int) = (d,3)
or RDD.max:

val maxKey2 = rdd.max()(new Ordering[Tuple2[String, Int]]() {
  override def compare(x: (String, Int), y: (String, Int)): Int = 
      Ordering[Int].compare(x._2, y._2)
})

answered May 25, 2018 by nitinrawat895
• 10,870 points

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3, 2019 in Apache Spark by Omkar
• 68,980 points
469 views
0 votes
1 answer

How to create RDD from parallelized collection in scala?

Hi, You can check this example in your ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 26,110 points
220 views
0 votes
1 answer

How to create RDD from existing RDD in scala?

scala> val rdd1 = sc.parallelize(List(1,2,3,4,5))                           -  Creating ...READ MORE

answered Feb 28 in Apache Spark by anonymous
69 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by kurt_cobain
• 9,310 points
8,259 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,870 points
2,264 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 10,870 points
11,558 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,870 points
2,061 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

answered Dec 10, 2018 in Apache Spark by Akshay
20,747 views