How to print the contents of RDD in Apache Spark?

0 votes

I want to output the contents of a collection to the Spark console.

I have a type:

linesWithSessionId: org.apache.spark.rdd.RDD[String] = FilteredRDD[3]
And I use the command:

scala> linesWithSessionId.map(line => println(line))
But this is printed :

res1: org.apache.spark.rdd.RDD[Unit] = MappedRDD[4] at map at :19

How can I write the RDD to console or save it to disk so I can view its contents?

Jul 6, 2018 in Apache Spark by Shubham
• 12,810 points
5,919 views

7 answers to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

If you want to view the content of a RDD, one way is to use collect():

myRDD.collect().foreach(println)
That's not a good idea, though, when the RDD has billions of lines. Use take() to take just a few to print out:

myRDD.take(n).foreach(println)

Hope this will help you.

answered Jul 6, 2018 by nitinrawat895
• 9,450 points
0 votes
  • The map function is a transformation, which means that Spark will not actually evaluate your RDD until you run an action on it.
  • To print it, you can use foreach (which is an action):​    linesWithSessionId.foreach(println)
  • To write it to disk you can use one of the saveAs... functions (still actions) from the RDD API
answered Aug 6, 2018 by zombie
• 3,690 points
0 votes

Here's another way using session:

linesWithSessionId.toArray().foreach(line => println(line))
answered Dec 10, 2018 by jisen
0 votes

You can first convert it to dataframe and then print it:

line.toDF().show()
answered Dec 10, 2018 by Nahoju
0 votes

Save it to a text file:

line.saveAsTextFile("alicia.txt")

Print contains of the text file:

answered Dec 10, 2018 by Akshay
0 votes
print (line.take(n))
answered Dec 10, 2018 by Rahul
0 votes

Simple and easy:

line.foreach(println)
answered Dec 10, 2018 by Kuber

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,260 points
87 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
8,895 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,810 points
1,209 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
3,853 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 9,450 points
643 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 9,450 points
3,505 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 9,450 points
889 views
0 votes
1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

answered Jun 18, 2018 in Apache Spark by nitinrawat895
• 9,450 points
109 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.