How to print the contents of RDD in Apache Spark

+1 vote

I want to output the contents of a collection to the Spark console.

I have a type:

linesWithSessionId: org.apache.spark.rdd.RDD[String] = FilteredRDD[3]
And I use the command:

scala> linesWithSessionId.map(line => println(line))
But this is printed :

res1: org.apache.spark.rdd.RDD[Unit] = MappedRDD[4] at map at :19

How can I write the RDD to console or save it to disk so I can view its contents?

Jul 6, 2018 in Apache Spark by Shubham
• 13,490 points
60,647 views

8 answers to this question.

+1 vote

If you want to view the content of a RDD, one way is to use collect():

myRDD.collect().foreach(println)
That's not a good idea, though, when the RDD has billions of lines. Use take() to take just a few to print out:

myRDD.take(n).foreach(println)

Hope this will help you.

answered Jul 6, 2018 by nitinrawat895
• 11,380 points
0 votes
  • The map function is a transformation, which means that Spark will not actually evaluate your RDD until you run an action on it.
  • To print it, you can use foreach (which is an action):​    linesWithSessionId.foreach(println)
  • To write it to disk you can use one of the saveAs... functions (still actions) from the RDD API
answered Aug 7, 2018 by zombie
• 3,790 points
0 votes

Here's another way using session:

linesWithSessionId.toArray().foreach(line => println(line))
answered Dec 10, 2018 by jisen
+1 vote

You can first convert it to dataframe and then print it:

line.toDF().show()
answered Dec 10, 2018 by Nahoju
+2 votes

Save it to a text file:

line.saveAsTextFile("alicia.txt")

Print contains of the text file:

answered Dec 10, 2018 by Akshay
–1 vote
print (line.take(n))
answered Dec 10, 2018 by Rahul
–1 vote

Simple and easy:

line.foreach(println)
answered Dec 10, 2018 by Kuber
0 votes

Hi,

You can follow a similar kind of approach as shown below to see the content of an RDD.

val dept = List(("Finance",10),("Marketing",20),("Sales",30), ("IT",40))
val rdd=spark.sparkContext.parallelize(dept)
val dataColl=rdd.collect()
dataColl.foreach(println)
answered Dec 15, 2020 by MD
• 95,440 points

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

How to save RDD in Apache Spark?

Hey, There are few methods provided by the ...READ MORE

answered Jul 23, 2019 in Apache Spark by Gitika
• 65,910 points
3,250 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
1,931 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
13,001 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
16,703 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
7,985 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 11,380 points
33,759 views
0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 14, 2020 in Apache Spark by Gitika
• 65,910 points
121,591 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 28, 2018 in Apache Spark by shams
• 3,670 points
42,318 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP