Setting up checkpoint dir PySpark Data Science

0 votes

I need to run connectedComponents() from GraphFrames. However when I try to do this, I get the following error:

Py4JJavaError: An error occurred while calling o221.run.
: java.io.IOException: Checkpoint directory is not set. Please set it first using sc.setCheckpointDir(). 

Can someone please help me set up checkpoint dir using PySpark in Data Science experience tool?

Aug 2, 2019 in Data Analytics by Sophie may
• 10,530 points
2,225 views

1 answer to this question.

0 votes

You can follow the below steps:

Change the working dir in order to set the checkpoint dir with sc.setCheckpointDir():

!pwd

Next, create a directory on that route:

!mkdir <pwd_output>/checkpoints

Set the checkpoint:

spark.sparkContext.setCheckpointDir('<pwd_output>/checkpoints')
answered Aug 2, 2019 by Zulaikha
• 910 points

Related Questions In Data Analytics

+1 vote
1 answer

R query and Data Science

Dear Deepika, Hope you are doing great. You can ...READ MORE

answered Dec 17, 2017 in Data Analytics by Sudhir
• 1,610 points
123 views
0 votes
2 answers

"Train" and "Test" sets in Data Science

Normally to perform supervised learning you need ...READ MORE

answered Aug 2, 2018 in Data Analytics by Abhi
• 3,720 points
249 views
0 votes
2 answers

What is data science?

Data Science is the practice of: Asking questions (formulating hypothesis), ...READ MORE

answered Aug 2, 2018 in Data Analytics by Abhi
• 3,720 points
122 views
+10 votes
3 answers

Which is a better initiative to learn data science: Python or R?

Well it truly depends on your requirement, If ...READ MORE

answered Aug 8, 2018 in Data Analytics by Abhi
• 3,720 points
271 views
+1 vote
2 answers

How can I get experience in Data Science as a fresher?

Work on projects of your own. It’s tough, ...READ MORE

answered Aug 9, 2018 in Data Analytics by Abhi
• 3,720 points
125 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
6,838 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
1,095 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
48,180 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
2,594 views
0 votes
1 answer

R programming: Data Categorization

Try this: wineData$taste <- NA ...READ MORE

answered Jun 28, 2019 in Data Analytics by Zulaikha
• 910 points
123 views