Average function is not commutative and associative?

0 votes

Hi,

I am not getting what is wrong with the code while finding the average, the average function is not showing commutative and associative. Can someone help how can I change the code to make it work properly?

Here is the code given:

def sum(x, y):

return x+y;

total = myrdd.reduce(sum);

avg = total / myrdd.count();

 
Jul 22 in Apache Spark by Leena
59 views

1 answer to this question.

0 votes

Hey,

I guess the only problem with the code is that the total might become very big thus overflow. So, I would rather divide each number by count and then sum in the following way.

You can use this code to see the result in a better manner:

cnt = myrdd.count();

def devideByCnd(x):

return x/cnt;

myrdd1 = myrdd.map(devideByCnd);

avg = myrdd.reduce(sum);

answered Jul 22 by Gitika
• 25,300 points

Related Questions In Apache Spark

0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

answered Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,360 points
1,998 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

answered May 29, 2018 in Apache Spark by Data_Nerd
• 2,360 points
39 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 27, 2018 in Apache Spark by shams
• 3,580 points
11,822 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,490 points
2,341 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,490 points
238 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
12,017 views
0 votes
1 answer

What is the difference between persist() and cache() in apache spark?

Hi, persist () allows the user to specify ...READ MORE

answered Jul 3 in Apache Spark by Gitika
• 25,300 points
150 views
0 votes
1 answer

How SparkSQL is different from HQL and SQL?

Hi, SparkSQL is a special component on the ...READ MORE

answered Jul 3 in Apache Spark by Gitika
• 25,300 points
31 views