How to get ID of a map task in Spark?

0 votes
Is there a way to get ID of a map task in Spark? For example if each map task calls a user defined function, can I get the ID of that map task from whithin that user defined function?
Nov 20, 2018 in Apache Spark by Neha
• 6,280 points
406 views

1 answer to this question.

0 votes

you can access task information using TaskContext:

import org.apache.spark.TaskContext

sc.parallelize(Seq[Int](), 4).mapPartitions(_ => {
    val ctx = TaskContext.get
    val stageId = ctx.stageId
    val partId = ctx.partitionId
    val hostname = java.net.InetAddress.getLocalHost().getHostName()
    Iterator(s"Stage: $stageId, Partition: $partId, Host: $hostname")
}).collect.foreach(println)

A similar functionality has been added to PySpark in Spark 2.2.0 (SPARK-18576):

from pyspark import TaskContext
import socket

def task_info(*_):
    ctx = TaskContext()
    return ["Stage: {0}, Partition: {1}, Host: {2}".format(
        ctx.stageId(), ctx.partitionId(), socket.gethostname())]

for x in sc.parallelize([], 4).mapPartitions(task_info).collect():
    print(x)
answered Nov 20, 2018 by Frankie
• 9,810 points

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

How do I access the Map Task ID in Spark?

You can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Jul 23 in Apache Spark by ravikiran
• 4,560 points
44 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,260 points
213 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
11,144 views
0 votes
1 answer
0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 2 in Big Data Hadoop by ravikiran
• 4,560 points
71 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
3,025 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
14,944 views
0 votes
1 answer

How to open/stream .zip files through Spark?

You can try and check this below ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
416 views
0 votes
1 answer

What is Executor Memory in a Spark application?

Every spark application has same fixed heap ...READ MORE

answered Jan 4 in Apache Spark by Frankie
• 9,810 points
498 views