Setting up checkpoint dir: PySpark Data Science

0 votes

I need to run connectedComponents() from GraphFrames. However when I try to do this, I get the following error:

Py4JJavaError: An error occurred while calling o221.run.
: java.io.IOException: Checkpoint directory is not set. Please set it first using sc.setCheckpointDir(). 

Can someone please help me set up checkpoint dir using PySpark in Data Science experience tool?

Aug 2, 2019 in Data Analytics by Sophie may
• 9,920 points
215 views

1 answer to this question.

0 votes

You can follow the below steps:

Change the working dir in order to set the checkpoint dir with sc.setCheckpointDir():

!pwd

Next, create a directory on that route:

!mkdir <pwd_output>/checkpoints

Set the checkpoint:

spark.sparkContext.setCheckpointDir('<pwd_output>/checkpoints')
answered Aug 2, 2019 by Zulaikha
• 890 points

Related Questions In Data Analytics

0 votes
1 answer

R query and Data Science

Dear Deepika, Hope you are doing great. You can ...READ MORE

answered Dec 17, 2017 in Data Analytics by Sudhir
• 1,610 points
48 views
0 votes
2 answers

"Train" and "Test" sets in Data Science

Normally to perform supervised learning you need ...READ MORE

answered Aug 2, 2018 in Data Analytics by Anmol
• 3,680 points
81 views
0 votes
2 answers

What is data science?

Data Science is the practice of: Asking questions (formulating hypothesis), ...READ MORE

answered Aug 2, 2018 in Data Analytics by Anmol
• 3,680 points
53 views
+9 votes
3 answers

Which is a better initiative to learn data science: Python or R?

Well it truly depends on your requirement, If ...READ MORE

answered Aug 8, 2018 in Data Analytics by Anmol
• 3,680 points
97 views
+1 vote
2 answers

How can I get experience in Data Science as a fresher?

Work on projects of your own. It’s tough, ...READ MORE

answered Aug 9, 2018 in Data Analytics by Anmol
• 3,680 points
60 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
3,934 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
544 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
20,981 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,290 points
1,484 views
0 votes
1 answer

R programming: Data Categorization

Try this: wineData$taste <- NA ...READ MORE

answered Jun 28, 2019 in Data Analytics by Zulaikha
• 890 points
37 views