Difference between map() and mapPartitions() function in Spark??

0 votes

Hi everyone,

Can someone tell me, what is the basic difference between map() and mapPartitions() in Spark??

Please give an example.

Thank You

Jan 23 in Apache Spark by akhtar
• 1,440 points
94 views

1 answer to this question.

0 votes

Hi@ akhtar,

Both map() and mapPartitions() are the transformation present in spark rdd.

Consider, You have a file which contains 50 lines and there are five partitions. Each partitions contains 10 lines. 

If you use map(func) to rdd, then the func() will be applied on each and every line and in this particular case func() will be called 50 times. So, it will take more time to process.

If, you use mapPartitons(func) to rdd. then the func() will be applied on each partitions and in this case func() will be called 5 times. So, the processing speed will be more.

Hope this will help you

Thank You

answered Jan 29 by MD
• 2,750 points

Related Questions In Apache Spark

0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 10,840 points
10,183 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 27, 2018 in Apache Spark by shams
• 3,580 points
20,623 views
0 votes
1 answer

What is the difference between persist() and cache() in apache spark?

Hi, persist () allows the user to specify ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 25,750 points
887 views
0 votes
1 answer

Difference between cogroup and full outer join in spark

Please go through the below explanation : Full ...READ MORE

answered Jul 13, 2019 in Apache Spark by Kiran
2,133 views
+1 vote
1 answer
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
4,133 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
582 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
22,310 views
0 votes
1 answer

What is the difference between spark streaming and spark structured streaming?

Hi@akhtar Generally, Spark streaming  is used for real time ...READ MORE

answered Feb 4 in Apache Spark by MD
• 2,750 points
39 views
0 votes
1 answer

Cannot create directory /hive/xzxz/_temporary/0. Name node is in safe mode.

Hi@akhtar, Here you are trying to save csv ...READ MORE

answered Feb 3 in Apache Spark by MD
• 2,750 points
21 views