Efficient way to read specific columns from parquet file in spark

0 votes
I was wondering is spark.read.parquet(../parquet file path).select(...) the best way to read subsets of columns in spark from a parquet file? Are there any other options?
Apr 20, 2018 in Apache Spark by Ashish
• 2,630 points

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

As parquet is a column based storage file, so 

val df = spark.read.parquet("fs://path/file.parquet"),load(<parquet>).select(...)

is the best option
answered Apr 20, 2018 by kurt_cobain
• 9,260 points

Related Questions In Apache Spark

0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,710 points
0 votes
1 answer
0 votes
1 answer

Parquet to ORC format in Spark

I appreciate that you want to try ...READ MORE

answered Feb 14 in Apache Spark by Anjali
0 votes
0 answers
0 votes
1 answer

Parquet File

Parquet is a columnar format file supported ...READ MORE

answered Jun 4, 2018 in Apache Spark by Data_Nerd
• 2,340 points
0 votes
2 answers

Parquet Files Advantages

Parquet is a columnar format supported by ...READ MORE

answered Jul 3, 2018 in Apache Spark by zombie
• 3,690 points
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
+1 vote
1 answer
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,260 points
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21 in Apache Spark by anonymous

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.