Efficient way to read specific columns from parquet file in spark

0 votes
I was wondering is spark.read.parquet(../parquet file path).select(...) the best way to read subsets of columns in spark from a parquet file? Are there any other options?
Apr 20, 2018 in Apache Spark by Ashish
• 2,630 points
1,123 views

1 answer to this question.

0 votes

As parquet is a column based storage file, so 

val df = spark.read.parquet("fs://path/file.parquet"),load(<parquet>).select(...)


is the best option
answered Apr 20, 2018 by kurt_cobain
• 9,240 points

Related Questions In Apache Spark

0 votes
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6 in Apache Spark by Gitika
• 25,300 points
45 views
0 votes
1 answer

Copy file from local to hdfs from the spark job in yarn mode

Refer to the below code: import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem import ...READ MORE

answered Jul 24 in Apache Spark by Yogi
34 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
1,041 views
0 votes
1 answer
0 votes
1 answer

Parquet File

Parquet is a columnar format file supported ...READ MORE

answered Jun 4, 2018 in Apache Spark by Data_Nerd
• 2,360 points
87 views
0 votes
2 answers

Parquet Files Advantages

Parquet is a columnar format supported by ...READ MORE

answered Jul 3, 2018 in Apache Spark by zombie
• 3,690 points
289 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
173 views
+1 vote
1 answer
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,240 points
974 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

answered Mar 21 in Apache Spark by anonymous
26,117 views