Unable to get the Job status and Group ID java- spark standalone program with databricks

package com.dataguise.test;

import java.io.IOException;

import java.util.concurrent.CountDownLatch;

import java.util.concurrent.TimeUnit;

import org.apache.spark.SparkContext;

import org.apache.spark.SparkJobInfo;

import org.apache.spark.SparkStageInfo;

import org.apache.spark.SparkStatusTracker;

import org.apache.spark.api.java.JavaSparkContext;

import org.apache.spark.api.java.JavaSparkStatusTracker;

import org.apache.spark.launcher.SparkAppHandle;

import org.apache.spark.launcher.SparkAppHandle.State;

import org.apache.spark.launcher.SparkLauncher;

import org.apache.spark.scheduler.SparkListener;

import org.apache.spark.sql.Dataset;

import org.apache.spark.sql.Row;

import org.apache.spark.sql.SparkSession;

import org.apache.spark.sql.types.DataTypes;

import com.google.gson.Gson;

import com.google.gson.GsonBuilder;

public class Dataframetest {

public static void main(String[] args) throws IOException, InterruptedException {

// TODO Auto-generated method stub

SparkSession sess = SparkSession.builder().appName("dataframetest").master("local[*]").getOrCreate();

sess.conf().set(key, value);

sess.sparkContext().hadoopConfiguration().set(key, value);

Gson gson = new GsonBuilder().setPrettyPrinting().create();

String inputPaths = "abfss://folder1/testing.orc";

String[] inputFiles = inputPaths.split(",");

Dataset<Row> csvRead = sess.read().format("orc").load(inputFiles).

withColumn("dg_filename", org.apache.spark.sql.functions.input_file_name())

.withColumn("dg_metadata", org.apache.spark.sql.functions.lit(null).cast(DataTypes.StringType));

csvRead.show(1000, false);


With this program, we are successfully able to submit the job on the cluster and it is completing successfully. But I am not able to get the job status and group ID in the code. I need to get the job status in the program for internal use.

Anyone, please help me with this.

Hi, @Kamboj,

Are you facing any kind of error on the way? Or you are not able to get the job status and group ID in the code?

Hi @Gitika,

I am not facing any type of error.  Actually I am not sure how to get the job status and job id etc in the code (using spark with databricks). I have used the below mentioned code in my program as well.

 JavaSparkContext jsc = JavaSparkContext.fromSparkContext(sess.sparkContext());
        JavaSparkStatusTracker statusTracker = jsc.statusTracker();
       int[] a=statusTracker.getActiveJobIds();
        for(int jobId: a) {
             SparkJobInfo jobInfo = statusTracker.getJobInfo(jobId);
             System.out.println("Job " + jobId + " status is " + jobInfo.status().name());
             System.out.println("Stages status:");

             for(int stageId: jobInfo.stageIds()) {
                 SparkStageInfo stageInfo = statusTracker.getStageInfo(stageId);

                 System.out.println("Stage id=" + stageId + "; name = " + stageInfo.name()
                            + "; completed tasks:" + stageInfo.numCompletedTasks()
                            + "; active tasks: " + stageInfo.numActiveTasks()
                            + "; all tasks: " + stageInfo.numTasks()
                            + "; submission time: " + stageInfo.submissionTime());

However method "statusTracker.getActiveJobIds()" returns null value.


Go through the Databricks Jobs API document. You will get some examples.


No Success, can anyone help me out to get the status of spark job.


You can run Databricks jobs CLI subcommands by appending them to databricks jobs and job run commands by appending them to databricks runs.

databricks jobs -h

Is there any option to get the job status by using Databricks Spark  classes or methods like I have been trying in my code snipped, I am using "JavaSparkStatusTracker" class but not getting the job status.

I don't know in this way it will work or not. But if your requirement is to find Job_ID and status then you can use databricks command in a script and run it.

Thanks for the quick response MD.

Could you please elaborate it little more, how I can use it and how it will help me to get the status for a particular ID. I think i will provide the results of all the running ID. What if I need status for a particular job ID.


you can track the status of jobs from inside the application by registering a SparkListener with SparkContext.addSparkListener. You can go through the below link for similar kinds of examples.


