When not to use foreachPartition and mapPartition?

0 votes
Are there any scenarios where the use of map() and foreach() is preferred over mapPartition and foreachPartition?
Apr 30, 2018 in Apache Spark by shams
• 3,580 points
1,475 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes
With mapPartion() or foreachPartition(), you can only modify/iterate the partition data. Nodes can't be invoked while executing the code as it will be executed on the executors.  This code should be executed only from the driver node. Thus only from the driver code you can access dataframes or spark session.
answered Apr 30, 2018 by Data_Nerd
• 2,340 points

Related Questions In Apache Spark

0 votes
1 answer

Not able to use sc in spark shell

Seems like master and worker are not ...READ MORE

answered Jan 3 in Apache Spark by Omkar
• 65,850 points
72 views
0 votes
1 answer
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,270 points
1,003 views
0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

answered Jun 14, 2018 in Apache Spark by nitinrawat895
• 9,070 points
773 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 9,070 points
778 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21 in Apache Spark by anonymous
17,895 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
2,666 views
0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

answered Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,260 points
63 views
0 votes
1 answer

Can I read a CSV represented as a string into Apache Spark?

You can use the following command. This ...READ MORE

answered May 3, 2018 in Apache Spark by kurt_cobain
• 9,260 points
37 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

answered May 29, 2018 in Apache Spark by Data_Nerd
• 2,340 points
23 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.