Pyspark rdd How to get partition number in output ?

0 votes
z = sc.parallelize([1,2,3,4,5,6], 2)

How to get partition number in output?

Jan 8 in Python by digger
• 27,640 points
155 views

1 answer to this question.

0 votes

The glom function is what you are looking for:

glom(self): Return an RDD created by coalescing all elements within each partition into a list.

a = sc.parallelize(range(10), 5)
a.glom().collect()
#output:[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]
answered Jan 8 by Omkar
• 67,460 points

Related Questions In Python

0 votes
1 answer

How to get textual output when using exceptions in Python?

Hi, the answer is pretty simple.  Without the ...READ MORE

answered Jan 17 in Python by Nymeria
• 3,520 points
31 views
0 votes
2 answers

How to calculate square root of a number in python?

calculate square root in python >>> import math ...READ MORE

answered Apr 2 in Python by anonymous
184 views
0 votes
1 answer

How to get the size of a string in Python?

If you are talking about the length ...READ MORE

answered Jun 4, 2018 in Python by ariaholic
• 7,340 points
90 views
0 votes
3 answers

How to get the current time in Python

FOLLOWING WAY TO FIND CURRENT TIME IN ...READ MORE

answered Apr 8 in Python by rajesh
• 1,210 points
63 views
0 votes
0 answers

try except is not working while using hdfs command

Hi,  I am trying to run following things ...READ MORE

Mar 6 in Python by anonymous
41 views
0 votes
1 answer
0 votes
1 answer

How to get SQL configuration in Spark using Python?

You can get the configuration details through ...READ MORE

answered Mar 18 in Apache Spark by John
55 views
0 votes
1 answer

How to get absolute path in python?

Instead of using os.path.dirname method which returns the relative ...READ MORE

answered Feb 3 in Python by Omkar
• 67,460 points
238 views
0 votes
1 answer

How to get path with filename in python?

There is no direct way to get ...READ MORE

answered Feb 3 in Python by Omkar
• 67,460 points
93 views