val a= spark.sparkContext.parallelize(Array(("a",1),("a",2),("b",2)))
val b =a.foldByKey(1)(_+_)
res2: Array[(String, Int)] = Array((b,3), (a,5))
Can someone tell me why a value is 5 not 4?
Please have a look below for your reference.
(a,1) (a,2) => foldByKey(1)(_+_) => (a,1+1)+(a,2+1) => 2+3 = 5
(b,2) => foldByKey(1)(_+_) => (b,2+1) = 3
According to that logic, the value is 5.
println("Slayer") is an anonymous block and gets ...READ MORE
Yes, you can reorder the dataframe elements.
You need ...READ MORE
There are 2 ways to check the ...READ MORE
Hadoop 3 is not widely used in ...READ MORE
Instead of spliting on '\n'. You should ...READ MORE
Firstly you need to understand the concept ...READ MORE
org.apache.hadoop.mapred is the Old API
org.apache.hadoop.mapreduce is the ...READ MORE
put <localSrc> <dest>
copyFr ...READ MORE
its late but this how you can ...READ MORE
You can use the function expr
val data ...READ MORE