How to print the contents of RDD in Apache Spark?

0 votes

I want to output the contents of a collection to the Spark console.

I have a type:

linesWithSessionId: org.apache.spark.rdd.RDD[String] = FilteredRDD[3]
And I use the command:

scala> linesWithSessionId.map(line => println(line))
But this is printed :

res1: org.apache.spark.rdd.RDD[Unit] = MappedRDD[4] at map at :19

How can I write the RDD to console or save it to disk so I can view its contents?

Jul 6, 2018 in Apache Spark by Shubham
• 13,300 points
11,087 views

7 answers to this question.

0 votes

If you want to view the content of a RDD, one way is to use collect():

myRDD.collect().foreach(println)
That's not a good idea, though, when the RDD has billions of lines. Use take() to take just a few to print out:

myRDD.take(n).foreach(println)

Hope this will help you.

answered Jul 6, 2018 by nitinrawat895
• 10,690 points
0 votes
  • The map function is a transformation, which means that Spark will not actually evaluate your RDD until you run an action on it.
  • To print it, you can use foreach (which is an action):​    linesWithSessionId.foreach(println)
  • To write it to disk you can use one of the saveAs... functions (still actions) from the RDD API
answered Aug 6, 2018 by zombie
• 3,690 points
0 votes

Here's another way using session:

linesWithSessionId.toArray().foreach(line => println(line))
answered Dec 10, 2018 by jisen
0 votes

You can first convert it to dataframe and then print it:

line.toDF().show()
answered Dec 10, 2018 by Nahoju
0 votes

Save it to a text file:

line.saveAsTextFile("alicia.txt")

Print contains of the text file:

answered Dec 10, 2018 by Akshay
0 votes
print (line.take(n))
answered Dec 10, 2018 by Rahul
0 votes

Simple and easy:

line.foreach(println)
answered Dec 10, 2018 by Kuber

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

How to save RDD in Apache Spark?

Hey, There are few methods provided by the ...READ MORE

answered Jul 22 in Apache Spark by Gitika
• 25,340 points
121 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,240 points
211 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
14,827 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
5,745 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,690 points
1,306 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 10,690 points
6,693 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,690 points
1,453 views
0 votes
1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

answered Jun 18, 2018 in Apache Spark by nitinrawat895
• 10,690 points
182 views