map() and flatmap()

0 votes
What is the difference between map() and flatmap()?
Jun 20, 2018 in Apache Spark by Ashish
• 2,630 points
40 views

2 answers to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes
The map() transformation takes in a function and applies it to each element in the RDD with the result of the function being the new value of each element in the resulting RDD. Sometimes we want to produce multiple output elements for each input element. The operation to do this is called flatMap(). As with map(), the function we provide to flatMap() is called individually for each element in our input RDD. Instead of returning a single element, we return an iterator with our return values.
answered Jun 20, 2018 by kurt_cobain
• 9,260 points
0 votes

map(): Return a new distributed dataset formed by passing each element of the source through the function 

Example: 

val a = sc . parallelize ( List (" dog " , " salmon " , " salmon " , " rat " , " elephant") , 3)

val b = a . map ( _ . length )

val c = a . zip ( b )

c . collect

flatmap(): Similar to map, but allows emitting more than one item in the map function​

Example:

val a = sc . parallelize (1 to 10 , 5)

a . flatMap (1 to _ ) . collect

answered Jul 3, 2018 by zombie
• 3,690 points

Related Questions In Apache Spark

0 votes
1 answer

map() vs flatMap() in Spark

Both map() and flatMap() are used for ...READ MORE

answered Mar 8 in Apache Spark by Raj
183 views
0 votes
1 answer

Why is Spark faster than Hadoop Map Reduce

Firstly, it's the In-memory computation, if the file ...READ MORE

answered Apr 30, 2018 in Apache Spark by shams
• 3,580 points
55 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

answered Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,340 points
1,474 views
0 votes
1 answer

start-master and start-all?

sbin/start-master.sh : Starts a master instance on ...READ MORE

answered May 7, 2018 in Apache Spark by kurt_cobain
• 9,260 points
40 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
120 views
+1 vote
1 answer
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7 in Big Data Hadoop by pradeep
88 views
0 votes
1 answer

Joining Multiple Spark Dataframes

You can run the below code to ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by Bharani
• 4,550 points
132 views
+1 vote
2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,260 points
1,200 views
0 votes
1 answer

Difference between createOrReplaceTempView and registerTempTable

createOrReplaceTempView() creates/replaces a local temp view with the dataframe provided. Lifetime of this ...READ MORE

answered Apr 25, 2018 in Apache Spark by kurt_cobain
• 9,260 points
812 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.