Why is collect in SparkR slow?

0 votes
I have a huge dataframe which has more than 300k rows. The data is in a parquet file. My machine has 4 cores and 8GB of RAM.

R version 3.0 and spark 2.0. To bring dataset into R, I used the collect() function. It took around 3-4 mins for the data to be loaded into R.

Does the collect method usually take this much time?
May 3, 2018 in Apache Spark by shams
• 3,580 points
57 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes
It's not the collect() that is slow. Actually, Spark works on the principle of Lazy evaluations, ie. all the transformations are done in a DAG basis and the actions (here it's the collect()) is done at last using the original data, so that's why it might take time.

But having a 300K row data will take some time in loading.
answered May 3, 2018 by Data_Nerd
• 2,340 points

Related Questions In Apache Spark

0 votes
1 answer

Why is Spark faster than Hadoop Map Reduce

Firstly, it's the In-memory computation, if the file ...READ MORE

answered Apr 30, 2018 in Apache Spark by shams
• 3,580 points
53 views
0 votes
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 12,110 points
465 views
0 votes
1 answer

Spark 2.3? What is new in it?

Here are the changes in new version ...READ MORE

answered May 28, 2018 in Apache Spark by kurt_cobain
• 9,260 points
31 views
0 votes
1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

answered Jun 25, 2018 in Apache Spark by nitinrawat895
• 9,030 points
22 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
117 views
+1 vote
1 answer
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7 in Big Data Hadoop by pradeep
86 views
0 votes
1 answer

Joining Multiple Spark Dataframes

You can run the below code to ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by Bharani
• 4,550 points
122 views
0 votes
1 answer

cache tables in apache spark sql

Caching the tables puts the whole table ...READ MORE

answered May 4, 2018 in Apache Spark by Data_Nerd
• 2,340 points
377 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

answered May 29, 2018 in Apache Spark by Data_Nerd
• 2,340 points
23 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.