Difference between map and mapPartitions function in Spark

Hi everyone,

Can someone tell me, what is the basic difference between map() and mapPartitions() in Spark??

Please give an example.

Thank You

Jan 23, 2020 in Apache Spark by akhtar
• 38,260 points • 7,956 views

1 answer to this question.

Hi@ akhtar,

Both map() and mapPartitions() are the transformation present in spark rdd.

Consider, You have a file which contains 50 lines and there are five partitions. Each partitions contains 10 lines.

If you use map(func) to rdd, then the func() will be applied on each and every line and in this particular case func() will be called 50 times. So, it will take more time to process.

If, you use mapPartitons(func) to rdd. then the func() will be applied on each partitions and in this case func() will be called 5 times. So, the processing speed will be more.

Hope this will help you

Thank You

answered Jan 29, 2020 by MD
• 95,460 points

then why we have map function

commented Sep 16, 2020 by anonymous

Hi,

It depends on the use cases. Maybe sometimes we require a map and sometimes mapPartitions. You need to use according to your requirement.

commented Sep 16, 2020 by akhtar
• 38,260 points

Related Questions In Apache Spark

0 votes

1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 36,458 views

+1 vote

3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 28, 2018 in Apache Spark by shams
• 3,670 points • 46,240 views

0 votes

1 answer

What is the difference between persist() and cache() in apache spark?

Using cash technique we can save intermediate ...READ MORE

answered Dec 27, 2022 in Apache Spark by Deepthi

edited Mar 5, 2025 • 5,457 views

0 votes

1 answer

Difference between cogroup and full outer join in spark

Please go through the below explanation : Full ...READ MORE

answered Jul 14, 2019 in Apache Spark by Kiran
• 11,177 views

+1 vote

2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

answered Aug 7, 2019 in Apache Spark by ashish
• 7,625 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 14,366 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 5,128 views

+2 votes

11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 120,848 views

0 votes

1 answer

What is the difference between spark streaming and spark structured streaming?

Hi@akhtar Generally, Spark streaming is used for real time ...READ MORE

answered Feb 4, 2020 in Apache Spark by MD
• 95,460 points • 5,105 views

+2 votes

14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar • 97,488 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP