How to get the number of elements in partition?

0 votes
I'm exploring apache spark and wanted to know if there's any way to get the number of elements in a particular RDD partition using the partition ID?

Help required as it will make my tasks very easy.

Thanks in Advance
May 8, 2018 in Apache Spark by Data_Nerd
• 2,360 points
265 views

1 answer to this question.

0 votes
rdd.mapPartitions(iter => Array(iter.size).iterator, true) 

This command will give you a new RDD with elements that are the sizes of each partition

answered May 8, 2018 by kurt_cobain
• 9,280 points

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3 in Apache Spark by Omkar
• 68,180 points
291 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
13,789 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
532 views
+1 vote
1 answer
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7 in Big Data Hadoop by pradeep
223 views
0 votes
1 answer

What is the benefit of using CDH over other Distributors?

CDH is basically a packaged deal, where ...READ MORE

answered Mar 29, 2018 in Big Data Hadoop by kurt_cobain
• 9,280 points
61 views
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by kurt_cobain
• 9,280 points
6,443 views
0 votes
1 answer

How to get Spark dataset metadata?

There are a bunch of functions that ...READ MORE

answered Apr 26, 2018 in Apache Spark by kurt_cobain
• 9,280 points
498 views
0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

answered Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,280 points
138 views