How to save and retrieve the Spark RDD from HDFS?

0 votes
I am performing some analytics in Spark. I want to save an RDD in HDFS, which can read back later from HDFS for further processing. How can I do that?
May 29, 2018 in Apache Spark by code799
1,433 views

1 answer to this question.

0 votes

You can save the RDD using saveAsObjectFile and saveAsTextFile method. Whereas you can read the RDD using textFile and sequenceFile function from SparkContext.

​rdd.saveAsTextFile ("hdfs:/localhost:9000/abc/");
val loadRdds = sparkContext.textFile("hdfs:/localhost:9000/abc/").map (x => {})
answered May 29, 2018 by Shubham
• 13,110 points

Related Questions In Apache Spark

0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
6,922 views
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,240 points
844 views
0 votes
1 answer

How is Apache Spark different from the Hadoop approach?

In Hadoop MapReduce the input data is ...READ MORE

answered May 7, 2018 in Apache Spark by BD Master
60 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,030 points
1,015 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,030 points
2,032 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
10,294 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
754 views
+1 vote
1 answer
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,110 points
852 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
10,280 views