Setting up checkpoint dir: PySpark Data Science

0 votes

I need to run connectedComponents() from GraphFrames. However when I try to do this, I get the following error:

Py4JJavaError: An error occurred while calling o221.run.
: java.io.IOException: Checkpoint directory is not set. Please set it first using sc.setCheckpointDir(). 

Can someone please help me set up checkpoint dir using PySpark in Data Science experience tool?

Aug 2, 2019 in Data Analytics by Sophie may
• 10,100 points
1,412 views

1 answer to this question.

0 votes

You can follow the below steps:

Change the working dir in order to set the checkpoint dir with sc.setCheckpointDir():

!pwd

Next, create a directory on that route:

!mkdir <pwd_output>/checkpoints

Set the checkpoint:

spark.sparkContext.setCheckpointDir('<pwd_output>/checkpoints')
answered Aug 2, 2019 by Zulaikha
• 910 points

Related Questions In Data Analytics

+1 vote
1 answer

R query and Data Science

Dear Deepika, Hope you are doing great. You can ...READ MORE

answered Dec 17, 2017 in Data Analytics by Sudhir
• 1,610 points
93 views
0 votes
2 answers

"Train" and "Test" sets in Data Science

Normally to perform supervised learning you need ...READ MORE

answered Aug 2, 2018 in Data Analytics by Abhi
• 3,680 points
166 views
0 votes
2 answers

What is data science?

Data Science is the practice of: Asking questions (formulating hypothesis), ...READ MORE

answered Aug 2, 2018 in Data Analytics by Abhi
• 3,680 points
85 views
+10 votes
3 answers

Which is a better initiative to learn data science: Python or R?

Well it truly depends on your requirement, If ...READ MORE

answered Aug 8, 2018 in Data Analytics by Abhi
• 3,680 points
195 views
+1 vote
2 answers

How can I get experience in Data Science as a fresher?

Work on projects of your own. It’s tough, ...READ MORE

answered Aug 9, 2018 in Data Analytics by Abhi
• 3,680 points
87 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,950 points
5,884 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,950 points
885 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyF ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
37,128 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,320 points
2,199 views
0 votes
1 answer

R programming: Data Categorization

Try this: wineData$taste <- NA ...READ MORE

answered Jun 28, 2019 in Data Analytics by Zulaikha
• 910 points
82 views