cache tables in apache spark sql

0 votes
I was going through the documentation of Apache Spark. I couldn't understand this "Chaching tables using the in-memory columnar format"? Help needed.
Thanks in advance.
May 4, 2018 in Apache Spark by shams
• 3,580 points
450 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Caching the tables puts the whole table in memory as spark works on the principle of lazy evaluation. So all the transformations to be done are done when the data is transferred to memory, only the final action requires the data to be retrieved and follow the smart path ie. DAG. 

There's a setting for this too

spark.sql.inMemoryColumnarStorage.compressed = true

answered May 4, 2018 by Data_Nerd
• 2,340 points

Related Questions In Apache Spark

0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 9,410 points
3,504 views
0 votes
1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

answered Jun 19, 2018 in Apache Spark by nitinrawat895
• 9,410 points
724 views
0 votes
2 answers

Sorting rows in descending order in Spark SQL

df.orderBy(org.apache.spark.sql.functions.col("columnname").desc) READ MORE

answered Jan 8 in Apache Spark by Ram Reddymasi
4,180 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
5,917 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
3,034 views
0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

answered Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,260 points
71 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21 in Apache Spark by anonymous
20,184 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

answered Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,340 points
1,621 views
0 votes
1 answer

Akka in Spark

Spark uses Akka basically for scheduling. All ...READ MORE

answered May 31, 2018 in Apache Spark by Data_Nerd
• 2,340 points
173 views
0 votes
1 answer

Minimizing Data Transfers in Spark

Minimizing data transfers and avoiding shuffling helps ...READ MORE

answered Jun 19, 2018 in Apache Spark by Data_Nerd
• 2,340 points
88 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.