Unable to get the Job status and Group ID java- spark standalone program with databricks

0 votes
package com.dataguise.test;

import java.io.IOException;

import java.util.concurrent.CountDownLatch;

import java.util.concurrent.TimeUnit;

import org.apache.spark.SparkContext;

import org.apache.spark.SparkJobInfo;

import org.apache.spark.SparkStageInfo;

import org.apache.spark.SparkStatusTracker;

import org.apache.spark.api.java.JavaSparkContext;

import org.apache.spark.api.java.JavaSparkStatusTracker;

import org.apache.spark.launcher.SparkAppHandle;

import org.apache.spark.launcher.SparkAppHandle.State;

import org.apache.spark.launcher.SparkLauncher;

import org.apache.spark.scheduler.SparkListener;

import org.apache.spark.sql.Dataset;

import org.apache.spark.sql.Row;

import org.apache.spark.sql.SparkSession;

import org.apache.spark.sql.types.DataTypes;

import com.google.gson.Gson;

import com.google.gson.GsonBuilder;

public class Dataframetest {

public static void main(String[] args) throws IOException, InterruptedException {

// TODO Auto-generated method stub

SparkSession sess = SparkSession.builder().appName("dataframetest").master("local[*]").getOrCreate();

sess.conf().set(key, value);

sess.sparkContext().hadoopConfiguration().set(key, value);

Gson gson = new GsonBuilder().setPrettyPrinting().create();

String inputPaths = "abfss://folder1/testing.orc";

String[] inputFiles = inputPaths.split(",");

Dataset<Row> csvRead = sess.read().format("orc").load(inputFiles).

withColumn("dg_filename", org.apache.spark.sql.functions.input_file_name())

.withColumn("dg_metadata", org.apache.spark.sql.functions.lit(null).cast(DataTypes.StringType));

csvRead.show(1000, false);

}


With this program, we are successfully able to submit the job on the cluster and it is completing successfully. But I am not able to get the job status and group ID in the code. I need to get the job status in the program for internal use.

Anyone, please help me with this.

Jul 23 in Apache Spark by kamboj
• 140 points

recategorized Jul 28 by Gitika 162 views

Hi, @Kamboj,

Are you facing any kind of error on the way? Or you are not able to get the job status and group ID in the code?

Hi @Gitika,

I am not facing any type of error.  Actually I am not sure how to get the job status and job id etc in the code (using spark with databricks). I have used the below mentioned code in my program as well.

 JavaSparkContext jsc = JavaSparkContext.fromSparkContext(sess.sparkContext());
        JavaSparkStatusTracker statusTracker = jsc.statusTracker();
       int[] a=statusTracker.getActiveJobIds();
        for(int jobId: a) {
             SparkJobInfo jobInfo = statusTracker.getJobInfo(jobId);
             System.out.println("Job " + jobId + " status is " + jobInfo.status().name());
             System.out.println("Stages status:");

             for(int stageId: jobInfo.stageIds()) {
                 SparkStageInfo stageInfo = statusTracker.getStageInfo(stageId);

                 System.out.println("Stage id=" + stageId + "; name = " + stageInfo.name()
                            + "; completed tasks:" + stageInfo.numCompletedTasks()
                            + "; active tasks: " + stageInfo.numActiveTasks()
                            + "; all tasks: " + stageInfo.numTasks()
                            + "; submission time: " + stageInfo.submissionTime());
            }
        }

However method "statusTracker.getActiveJobIds()" returns null value.

Hi,

Go through the Databricks Jobs API document. You will get some examples.

https://docs.databricks.com/dev-tools/api/latest/jobs.html

No Success, can anyone help me out to get the status of spark job.

@Kamboj,

You can run Databricks jobs CLI subcommands by appending them to databricks jobs and job run commands by appending them to databricks runs.

Bash
databricks jobs -h

Is there any option to get the job status by using Databricks Spark  classes or methods like I have been trying in my code snipped, I am using "JavaSparkStatusTracker" class but not getting the job status.

I don't know in this way it will work or not. But if your requirement is to find Job_ID and status then you can use databricks command in a script and run it.

Thanks for the quick response MD.

Could you please elaborate it little more, how I can use it and how it will help me to get the status for a particular ID. I think i will provide the results of all the running ID. What if I need status for a particular job ID.

Hi,

you can track the status of jobs from inside the application by registering a SparkListener with SparkContext.addSparkListener. You can go through the below link for similar kinds of examples.

https://stackoverflow.com/questions/27165194/how-to-get-spark-job-status-from-program

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.

Related Questions In Apache Spark

0 votes
1 answer

Unable to run the java- spark standalone program

Though there is nothing wrong with the ...READ MORE

answered Jul 30, 2019 in Apache Spark by Lohit
377 views
0 votes
1 answer

Unable to submit the spark job in deployment mode - multinode cluster(using ubuntu machines) with yarn master

Hi@Ganendra, As you said you launched a multinode cluster, ...READ MORE

answered Jul 29 in Apache Spark by MD
• 57,720 points
114 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

answered May 29, 2018 in Apache Spark by Data_Nerd
• 2,390 points
176 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,450 points
6,148 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
1,293 views
0 votes
1 answer

where can i get spark-terasort.jar and not .scala file, to do spark terasort in windows.

Hi! I found 2 links on github where ...READ MORE

answered Feb 13, 2019 in Apache Spark by Omkar
• 69,030 points
386 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,950 points
5,911 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,950 points
892 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyF ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
37,429 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,320 points
2,209 views