How to print the contents of RDD in Apache Spark?

0 votes

I want to output the contents of a collection to the Spark console.

I have a type:

linesWithSessionId: org.apache.spark.rdd.RDD[String] = FilteredRDD[3]
And I use the command:

scala> linesWithSessionId.map(line => println(line))
But this is printed :

res1: org.apache.spark.rdd.RDD[Unit] = MappedRDD[4] at map at :19

How can I write the RDD to console or save it to disk so I can view its contents?

Jul 6, 2018 in Apache Spark by Shubham
• 13,450 points
32,642 views

7 answers to this question.

0 votes

If you want to view the content of a RDD, one way is to use collect():

myRDD.collect().foreach(println)
That's not a good idea, though, when the RDD has billions of lines. Use take() to take just a few to print out:

myRDD.take(n).foreach(println)

Hope this will help you.

answered Jul 6, 2018 by nitinrawat895
• 10,950 points
0 votes
  • The map function is a transformation, which means that Spark will not actually evaluate your RDD until you run an action on it.
  • To print it, you can use foreach (which is an action):​    linesWithSessionId.foreach(println)
  • To write it to disk you can use one of the saveAs... functions (still actions) from the RDD API
answered Aug 6, 2018 by zombie
• 3,750 points
0 votes

Here's another way using session:

linesWithSessionId.toArray().foreach(line => println(line))
answered Dec 10, 2018 by jisen
0 votes

You can first convert it to dataframe and then print it:

line.toDF().show()
answered Dec 10, 2018 by Nahoju
+1 vote

Save it to a text file:

line.saveAsTextFile("alicia.txt")

Print contains of the text file:

answered Dec 10, 2018 by Akshay
–1 vote
print (line.take(n))
answered Dec 10, 2018 by Rahul
–1 vote

Simple and easy:

line.foreach(println)
answered Dec 10, 2018 by Kuber

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

How to save RDD in Apache Spark?

Hey, There are few methods provided by the ...READ MORE

answered Jul 22, 2019 in Apache Spark by Gitika
• 37,370 points
1,156 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,320 points
546 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,450 points
6,106 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by kurt_cobain
• 9,320 points
10,985 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,950 points
3,487 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 10,950 points
16,151 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
39,819 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 27, 2018 in Apache Spark by shams
• 3,630 points
29,881 views