Setting up checkpoint dir PySpark Data Science

0 votes

I need to run connectedComponents() from GraphFrames. However when I try to do this, I get the following error:

Py4JJavaError: An error occurred while calling o221.run.
: java.io.IOException: Checkpoint directory is not set. Please set it first using sc.setCheckpointDir(). 

Can someone please help me set up checkpoint dir using PySpark in Data Science experience tool?

Aug 2, 2019 in Data Analytics by Sophie may
• 10,610 points
5,652 views

1 answer to this question.

0 votes

You can follow the below steps:

Change the working dir in order to set the checkpoint dir with sc.setCheckpointDir():

!pwd

Next, create a directory on that route:

!mkdir <pwd_output>/checkpoints

Set the checkpoint:

spark.sparkContext.setCheckpointDir('<pwd_output>/checkpoints')


After that it will work.

To know more about it, get your PySpark Certification today and become expert.

Thanks.

answered Aug 2, 2019 by Zulaikha
• 910 points

Related Questions In Data Analytics

+1 vote
1 answer

R query and Data Science

Dear Deepika, Hope you are doing great. You can ...READ MORE

answered Dec 18, 2017 in Data Analytics by Sudhir
• 1,610 points
582 views
0 votes
2 answers

"Train" and "Test" sets in Data Science

Normally to perform supervised learning you need ...READ MORE

answered Aug 3, 2018 in Data Analytics by Abhi
• 3,720 points
985 views
0 votes
1 answer

What is data science?

Data Science is the practice of: Asking questions (formulating hypothesis), ...READ MORE

answered Aug 3, 2018 in Data Analytics by Abhi
• 3,720 points
654 views
+10 votes
3 answers

Which is a better initiative to learn data science: Python or R?

Well it truly depends on your requirement, If ...READ MORE

answered Aug 9, 2018 in Data Analytics by Abhi
• 3,720 points
1,154 views
+1 vote
2 answers

How can I get experience in Data Science as a fresher?

Work on projects of your own. It’s tough, ...READ MORE

answered Aug 9, 2018 in Data Analytics by Abhi
• 3,720 points
591 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,612 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,212 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,868 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,291 views
0 votes
1 answer

R programming: Data Categorization

Try this: wineData$taste <- NA ...READ MORE

answered Jun 28, 2019 in Data Analytics by Zulaikha
• 910 points
706 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP