How can I calculate exact median with Apache Spark?

0 votes

This page contains some statistics functions (mean, stdev, variance, etc.) but it does not contain the median. How can I calculate exact median?

Oct 8, 2018 in Big Data Hadoop by slayer
• 29,040 points
261 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You need to sort RDD and take element in the middle or average of two elements. Here is example with RDD[Int]:

  import org.apache.spark.SparkContext._

  val rdd: RDD[Int] = ???

  val sorted = rdd.sortBy(identity).zipWithIndex().map {
    case (v, idx) => (idx, v)
  }

  val count = sorted.count()

  val median: Double = if (count % 2 == 0) {
    val l = count / 2 - 1
    val r = l + 1
    (sorted.lookup(l).head + sorted.lookup(r).head).toDouble / 2
  } else sorted.lookup(count / 2).head.toDouble
answered Oct 8, 2018 by Omkar
• 65,810 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How can I download only hdfs and not hadoop?

No, you cannot download HDFS alone because ...READ MORE

answered Mar 15, 2018 in Big Data Hadoop by nitinrawat895
• 9,030 points
54 views
0 votes
1 answer

How can I download hadoop documentation for a specific version?

You can go through this SVN link:- ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by Shubham
• 12,110 points
51 views
0 votes
1 answer

How can I get the respective Bitcoin value for an input in USD when using c#

Simply make call to server and parse ...READ MORE

answered Mar 25, 2018 in Big Data Hadoop by charlie_brown
• 7,710 points
26 views
0 votes
1 answer

How do I connect my Spark based HDInsight cluster to my blob storage?

Go through this blog: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage#access-blobs I went through this ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 12,110 points
398 views
0 votes
1 answer

How can I put file to HDFS directly without copying it local disk?

Can use pipe from wget to hdfs. You ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
386 views
0 votes
1 answer
0 votes
1 answer

How can I use my host machine’s web browser to check my HDFS services running in the VM?

The sole purpose of the virtual machine ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by Shubham
• 12,110 points
74 views
0 votes
1 answer

How can I write text in HDFS using CMD?

Hadoop put & appendToFile only reads standard ...READ MORE

answered Apr 27, 2018 in Big Data Hadoop by Shubham
• 12,110 points
36 views
0 votes
1 answer

Where can I find logs in Spark on YARN?

You can access logs through the command yarn ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 65,810 points
21 views
0 votes
1 answer

In Hadoop MapReduce, how can i set an Object as the Value for Map output?

Try this and see if it works: public ...READ MORE

answered Nov 20, 2018 in Big Data Hadoop by Omkar
• 65,810 points
17 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.