How to get ID of a map task in Spark?

0 votes
Is there a way to get ID of a map task in Spark? For example if each map task calls a user defined function, can I get the ID of that map task from whithin that user defined function?
Nov 20, 2018 in Apache Spark by Neha
• 6,280 points
213 views

1 answer to this question.

0 votes

you can access task information using TaskContext:

import org.apache.spark.TaskContext

sc.parallelize(Seq[Int](), 4).mapPartitions(_ => {
    val ctx = TaskContext.get
    val stageId = ctx.stageId
    val partId = ctx.partitionId
    val hostname = java.net.InetAddress.getLocalHost().getHostName()
    Iterator(s"Stage: $stageId, Partition: $partId, Host: $hostname")
}).collect.foreach(println)

A similar functionality has been added to PySpark in Spark 2.2.0 (SPARK-18576):

from pyspark import TaskContext
import socket

def task_info(*_):
    ctx = TaskContext()
    return ["Stage: {0}, Partition: {1}, Host: {2}".format(
        ctx.stageId(), ctx.partitionId(), socket.gethostname())]

for x in sc.parallelize([], 4).mapPartitions(task_info).collect():
    print(x)
answered Nov 20, 2018 by Frankie
• 9,810 points

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

How do I access the Map Task ID in Spark?

You can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered 18 hours ago in Apache Spark by ravikiran
• 3,720 points
8 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,240 points
112 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
7,193 views
0 votes
0 answers
0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 2 in Big Data Hadoop by ravikiran
• 3,720 points
47 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,210 points
2,083 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
10,697 views
0 votes
1 answer

How to open/stream .zip files through Spark?

You can try and check this below ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
233 views
0 votes
1 answer

What is Executor Memory in a Spark application?

Every spark application has same fixed heap ...READ MORE

answered Jan 4 in Apache Spark by Frankie
• 9,810 points
161 views