map vs flatMap in Spark

+1 vote

Please explain to me the difference between map() and flatMap() in Spark.

Thanks

Mar 8, 2019 in Apache Spark by Tina
39,959 views

3 answers to this question.

+1 vote

Both map() and flatMap() are used for transformations. 

The map() transformation takes in a function and applies it to each element in the RDD and the result of the function is a new value of each element in the resulting RDD. The flatMap() is used to produce multiple output elements for each input element. When using map(), the function we provide to flatMap() is called individually for each element in our input RDD. Instead of returning a single element, an iterator with the return values is returned.

answered Mar 8, 2019 by Raj
+2 votes

Spark map function expresses a one-to-one transformation. It transforms each element of a collection into one element of the resulting collection. While Spark flatMap function expresses a one-to-many transformation. It transforms each element to 0 or more elements.

answered Jun 17, 2019 by vishal
• 180 points
0 votes

Hi,

The map is a specific line or row to process that data. In FlatMap each input item can be mapped to multiple output items (so the function should return a Seq rather than a single item). So most frequently used to return Array elements.

answered Dec 16, 2020 by MD
• 95,460 points

Related Questions In Apache Spark

+1 vote
1 answer

map vs mapValues in Spark

There is a difference between the two: mapValues ...READ MORE

answered Jun 29, 2018 in Apache Spark by nitinrawat895
• 11,380 points
16,993 views
0 votes
1 answer

What is Map and flatMap in Spark?

Hi, The map is a specific line or ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 65,730 points
2,458 views
+1 vote
6 answers

groupByKey vs reduceByKey in Apache Spark.

ReduceByKey is the best for production. READ MORE

answered Mar 3, 2019 in Apache Spark by anonymous
79,308 views
0 votes
1 answer

Filter, Option or FlatMap in spark

If, for option 2, you mean have ...READ MORE

answered Nov 9, 2018 in Apache Spark by Frankie
• 9,830 points
3,294 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...