Setting up checkpoint dir: PySpark Data Science

0 votes

I need to run connectedComponents() from GraphFrames. However when I try to do this, I get the following error:

Py4JJavaError: An error occurred while calling o221.run.
: java.io.IOException: Checkpoint directory is not set. Please set it first using sc.setCheckpointDir(). 

Can someone please help me set up checkpoint dir using PySpark in Data Science experience tool?

Aug 2 in Data Analytics by Sophie may
• 9,530 points
8 views

1 answer to this question.

0 votes

You can follow the below steps:

Change the working dir in order to set the checkpoint dir with sc.setCheckpointDir():

!pwd

Next, create a directory on that route:

!mkdir <pwd_output>/checkpoints

Set the checkpoint:

spark.sparkContext.setCheckpointDir('<pwd_output>/checkpoints')
answered Aug 2 by Zulaikha
• 840 points

Related Questions In Data Analytics

0 votes
1 answer

R query and Data Science

Dear Deepika, Hope you are doing great. You can ...READ MORE

answered Dec 17, 2017 in Data Analytics by Sudhir
• 1,610 points
27 views
0 votes
2 answers

"Train" and "Test" sets in Data Science

Normally to perform supervised learning you need ...READ MORE

answered Aug 2, 2018 in Data Analytics by Anmol
• 3,620 points
39 views
0 votes
2 answers

What is data science?

Data Science is the practice of: Asking questions (formulating hypothesis), ...READ MORE

answered Aug 2, 2018 in Data Analytics by Anmol
• 3,620 points
29 views
0 votes
3 answers

Which is a better initiative to learn data science: Python or R?

Well it truly depends on your requirement, If ...READ MORE

answered Aug 8, 2018 in Data Analytics by Anmol
• 3,620 points
49 views
+1 vote
2 answers

How can I get experience in Data Science as a fresher?

Work on projects of your own. It’s tough, ...READ MORE

answered Aug 9, 2018 in Data Analytics by Anmol
• 3,620 points
37 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
2,404 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
245 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
12,232 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
898 views
0 votes
1 answer

R programming: Data Categorization

Try this: wineData$taste <- NA ...READ MORE

answered Jun 28 in Data Analytics by Zulaikha
• 840 points
13 views