How to print the contents of RDD in Apache Spark?

Question

I want to output the contents of a collection to the Spark console.I have a type:linesWithSessionId: org.apache.spark.rdd.RDD[String] = FilteredRDD[3]And I use the command:scala> linesWithSessionId.map(line => println(line))But this is printed :res1: org.apache.spark.rdd.RDD[Unit] = MappedRDD[4] at map at :19How can I write the RDD to console or save it to disk so I can view its contents?

nitinrawat895 · Answer

If you want to view the content of a RDD, one way is to use collect():myRDD.collect().foreach(println)That's not a good idea, though, when the RDD has billions of lines. Use take() to take just a few to print out:myRDD.take(n).foreach(println)Hope this will help you.

zombie · Answer

The map function is a transformation, which means that Spark will not actually evaluate your RDD until you run an action on it.To print it, you can use foreach (which is an action):&#8203;&#160; &#160;&#160;linesWithSessionId.foreach(println)To write it to disk you can use one of the saveAs... functions (still actions) from the RDD API

jisen · Answer

Here's another way using session:linesWithSessionId.toArray().foreach(line => println(line))

Nahoju · Answer

You can first convert it to dataframe and then print it:line.toDF().show()

Akshay · Answer

Save it to a text file:line.saveAsTextFile("alicia.txt")Print contains of the text file:

Rahul · Answer

print (line.take(n))

Kuber · Answer

Simple and easy:line.foreach(println)

MD · Answer

Hi,You can follow a similar kind of approach as shown below to see the content of an RDD.val dept = List(("Finance",10),("Marketing",20),("Sales",30), ("IT",40))
val rdd=spark.sparkContext.parallelize(dept)
val dataColl=rdd.collect()
dataColl.foreach(println)

Abdulkhadar · Answer

rdd1.collect().map(row=>println(row))

How to print the contents of RDD in Apache Spark

Your comment on this question:

8 answers to this question.

Your answer

Your comment on this answer:

Your comment on this answer:

Your comment on this answer:

Your comment on this answer:

Your comment on this answer:

Your comment on this answer:

Your comment on this answer:

Your comment on this answer:

Your comment on this answer:

Related Questions In Apache Spark

How to find the number of elements present in the array in a Spark DataFame column?

How to save RDD in Apache Spark?

How to get the number of elements in partition?

How to save and retrieve the Spark RDD from HDFS?

I installed Spark but while executing command, I am getting ‘hadoop’ command not found error?

Writing File into HDFS using spark scala

Is there any way to check the Spark version?

What's the difference between 'filter' and 'where' in Spark SQL?

How to change the spark Session configuration in Pyspark?

What is the difference between rdd and dataframes in Apache Spark ?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES