How do I access the Map Task ID in Spark?

0 votes
Assuming all the Map Tasks are user-defined, Is there a way that I could access the Map task ID?
Jul 23 in Apache Spark by nitinrawat895
• 10,670 points
35 views

1 answer to this question.

0 votes

You can access task information using TaskContext:

import org.apache.spark.TaskContext

sc.parallelize(Seq[Int](), 4).mapPartitions(_ => {
    val ctx = TaskContext.get
    val stageId = ctx.stageId
    val partId = ctx.partitionId
    val hostname = java.net.InetAddress.getLocalHost().getHostName()
    Iterator(s"Stage: $stageId, Partition: $partId, Host: $hostname")
}).collect.foreach(println)

A similar functionality has been added to PySpark in Spark 2.2.0 (SPARK-18576):

from pyspark import TaskContext
import socket

def task_info(*_):
    ctx = TaskContext()
    return ["Stage: {0}, Partition: {1}, Host: {2}".format(
        ctx.stageId(), ctx.partitionId(), socket.gethostname())]

for x in sc.parallelize([], 4).mapPartitions(task_info).collect():
    print(x)
answered Jul 23 by ravikiran
• 4,560 points

Related Questions In Apache Spark

0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
331 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4 in Apache Spark by Dhara dhruve
825 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
1,185 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
13,402 views
0 votes
1 answer
0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 2 in Big Data Hadoop by ravikiran
• 4,560 points
62 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
186 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,698 views
0 votes
1 answer

How do I turn off INFO Logging in Spark?

Execute this command in the spark directory: cp ...READ MORE

answered Jul 12 in Apache Spark by ravikiran
• 4,560 points
80 views
0 votes
1 answer

How do I connect to a HIVE Meta store through a program in SparkSQL?

In spark 2.0.+ it should look something ...READ MORE

answered Sep 5 in Apache Spark by ravikiran
• 4,560 points
25 views