Why is collect in SparkR slow?

0 votes
I have a huge dataframe which has more than 300k rows. The data is in a parquet file. My machine has 4 cores and 8GB of RAM.

R version 3.0 and spark 2.0. To bring dataset into R, I used the collect() function. It took around 3-4 mins for the data to be loaded into R.

Does the collect method usually take this much time?
May 3, 2018 in Apache Spark by shams
• 3,600 points
248 views

1 answer to this question.

0 votes
It's not the collect() that is slow. Actually, Spark works on the principle of Lazy evaluations, ie. all the transformations are done in a DAG basis and the actions (here it's the collect()) is done at last using the original data, so that's why it might take time.

But having a 300K row data will take some time in loading.
answered May 3, 2018 by Data_Nerd
• 2,370 points

Related Questions In Apache Spark

0 votes
1 answer

How to use yield keyword in scala and why it is used instead of println?

Hi, The yield keyword is used because the ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 31,430 points
168 views
0 votes
1 answer

Why is Spark faster than Hadoop Map Reduce

Firstly, it's the In-memory computation, if the file ...READ MORE

answered Apr 30, 2018 in Apache Spark by shams
• 3,600 points
317 views
+1 vote
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,380 points
885 views
0 votes
1 answer

Spark 2.3? What is new in it?

Here are the changes in new version ...READ MORE

answered May 28, 2018 in Apache Spark by kurt_cobain
• 9,310 points
124 views
+1 vote
1 answer
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
598 views
+1 vote
1 answer
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7, 2019 in Big Data Hadoop by pradeep
346 views
0 votes
1 answer

cache tables in apache spark sql

Caching the tables puts the whole table ...READ MORE

answered May 4, 2018 in Apache Spark by Data_Nerd
• 2,370 points
1,569 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

answered May 29, 2018 in Apache Spark by Data_Nerd
• 2,370 points
144 views