Efficient way to read specific columns from parquet file in spark

0 votes
I was wondering is spark.read.parquet(../parquet file path).select(...) the best way to read subsets of columns in spark from a parquet file? Are there any other options?
Apr 20, 2018 in Apache Spark by Ashish
• 2,650 points
7,885 views

1 answer to this question.

0 votes

As parquet is a column based storage file, so 

val df = spark.read.parquet("fs://path/file.parquet"),load(<parquet>).select(...)


is the best option
answered Apr 20, 2018 by kurt_cobain
• 9,350 points

Related Questions In Apache Spark

+1 vote
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 65,770 points
5,039 views
0 votes
1 answer

Copy file from local to hdfs from the spark job in yarn mode

Refer to the below code: import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem import ...READ MORE

answered Jul 24, 2019 in Apache Spark by Yogi
3,868 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,488 views
0 votes
1 answer

where can i get spark-terasort.jar and not .scala file, to do spark terasort in windows.

Hi! I found 2 links on github where ...READ MORE

answered Feb 13, 2019 in Apache Spark by Omkar
• 69,220 points
1,356 views
0 votes
1 answer

Parquet File

Parquet is a columnar format file supported ...READ MORE

answered Jun 4, 2018 in Apache Spark by Data_Nerd
• 2,390 points
1,070 views
0 votes
2 answers

Parquet Files Advantages

Parquet is a columnar format supported by ...READ MORE

answered Jul 4, 2018 in Apache Spark by zombie
• 3,790 points
2,226 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
1,911 views
+1 vote
1 answer
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,350 points
5,586 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21, 2019 in Apache Spark by anonymous
72,467 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP