Average function is not commutative and associative?

0 votes

Hi,

I am not getting what is wrong with the code while finding the average, the average function is not showing commutative and associative. Can someone help how can I change the code to make it work properly?

Here is the code given:

def sum(x, y):

return x+y;

total = myrdd.reduce(sum);

avg = total / myrdd.count();

 
Jul 22 in Apache Spark by Leena
127 views

1 answer to this question.

0 votes

Hey,

I guess the only problem with the code is that the total might become very big thus overflow. So, I would rather divide each number by count and then sum in the following way.

You can use this code to see the result in a better manner:

cnt = myrdd.count();

def devideByCnd(x):

return x/cnt;

myrdd1 = myrdd.map(devideByCnd);

avg = myrdd.reduce(sum);

answered Jul 22 by Gitika
• 25,420 points

Related Questions In Apache Spark

0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

answered Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,360 points
2,663 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

answered May 29, 2018 in Apache Spark by Data_Nerd
• 2,360 points
76 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 27, 2018 in Apache Spark by shams
• 3,580 points
18,026 views
0 votes
1 answer

where can i get spark-terasort.jar and not .scala file, to do spark terasort in windows.

Hi! I found 2 links on github where ...READ MORE

answered Feb 13 in Apache Spark by Omkar
• 68,160 points
153 views
+1 vote
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
3,549 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
440 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
18,170 views
0 votes
1 answer

What is the difference between persist() and cache() in apache spark?

Hi, persist () allows the user to specify ...READ MORE

answered Jul 3 in Apache Spark by Gitika
• 25,420 points
549 views
0 votes
1 answer

How SparkSQL is different from HQL and SQL?

Hi, SparkSQL is a special component on the ...READ MORE

answered Jul 3 in Apache Spark by Gitika
• 25,420 points
180 views