map() and flatmap()

0 votes
What is the difference between map() and flatmap()?
Jun 20, 2018 in Apache Spark by Ashish
• 2,630 points
46 views

2 answers to this question.

0 votes
The map() transformation takes in a function and applies it to each element in the RDD with the result of the function being the new value of each element in the resulting RDD. Sometimes we want to produce multiple output elements for each input element. The operation to do this is called flatMap(). As with map(), the function we provide to flatMap() is called individually for each element in our input RDD. Instead of returning a single element, we return an iterator with our return values.
answered Jun 20, 2018 by kurt_cobain
• 9,240 points
0 votes

map(): Return a new distributed dataset formed by passing each element of the source through the function 

Example: 

val a = sc . parallelize ( List (" dog " , " salmon " , " salmon " , " rat " , " elephant") , 3)

val b = a . map ( _ . length )

val c = a . zip ( b )

c . collect

flatmap(): Similar to map, but allows emitting more than one item in the map function​

Example:

val a = sc . parallelize (1 to 10 , 5)

a . flatMap (1 to _ ) . collect

answered Jul 3, 2018 by zombie
• 3,690 points

Related Questions In Apache Spark

0 votes
1 answer

What is Map and flatMap in Spark?

Hi, The map is a specific line or ...READ MORE

answered Jul 3 in Apache Spark by Gitika
• 20,760 points
28 views
0 votes
2 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

answered Jun 17 in Apache Spark by vishal
• 160 points
581 views
0 votes
1 answer

Why is Spark faster than Hadoop Map Reduce

Firstly, it's the In-memory computation, if the file ...READ MORE

answered Apr 30, 2018 in Apache Spark by shams
• 3,580 points
84 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

answered Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,360 points
1,820 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
153 views
+1 vote
1 answer
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7 in Big Data Hadoop by pradeep
122 views
0 votes
1 answer

Joining Multiple Spark Dataframes

You can run the below code to ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by Bharani
• 4,550 points
233 views
+1 vote
2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,503 views
0 votes
1 answer

Difference between createOrReplaceTempView and registerTempTable

createOrReplaceTempView() creates/replaces a local temp view with the dataframe provided. Lifetime of this ...READ MORE

answered Apr 25, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,211 views