Spark foldbykey doubt

0 votes
val a= spark.sparkContext.parallelize(Array(("a",1),("a",2),("b",2)))
val b =a.foldByKey(1)(_+_)

scala> b.collect
res2: Array[(String, Int)] = Array((b,3), (a,5))

Can someone tell me why a value is 5 not 4?

Jun 19, 2019 in Apache Spark by Jai
260 views

1 answer to this question.

0 votes

Please have a look below for your reference.

(a,1) (a,2) => foldByKey(1)(_+_) => (a,1+1)+(a,2+1) => 2+3 = 5

(b,2) => foldByKey(1)(_+_) => (b,2+1) = 3

According to that logic, the value is 5. 

answered Jun 19, 2019 by Tina

val a= spark.sparkContext.parallelize(Array(("a",1),("a",2),("b",2),("a",2)))

a: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at <console>:23

scala> val b =a.foldByKey(1)(_+_).collect

b: Array[(String, Int)] = Array((b,3), (a,7))

scala> val a= spark.sparkContext.parallelize(Array(("a",1),("a",2),("b",2),("b",3),("a",5)))

a: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[9] at parallelize at <console>:23

scala> val b =a.foldByKey(1)(_+_)

b: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[10] at foldByKey at <console>:25

scala> b.collect

res6: Array[(String, Int)] = Array((b,6), (a,10))

Q> Can anyone clarify me how the result is  Array((b,6), (a,10)) instead of  Array((b,7), (a,11))?

Hey, @Sitaram,

According to calculations, the result will be Array((b,6),(a,10), if you follow the above answer.

Related Questions In Apache Spark

0 votes
1 answer

Spark and Scale Auxiliary constructor doubt

println("Slayer") is an anonymous block and gets ...READ MORE

answered Jan 8, 2019 in Apache Spark by Omkar
• 69,000 points
60 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
6,582 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,870 points
2,337 views
+1 vote
2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,310 points
2,620 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,870 points
4,588 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,870 points
652 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
26,065 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21, 2019 in Apache Spark by anonymous
45,946 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

answered May 3, 2018 in Apache Spark by kurt_cobain
• 9,310 points
22,629 views