Efficient way to read specific columns from parquet file in spark

Question

I was wondering is spark.read.parquet(../parquet file path).select(...) the best way to read subsets of columns in spark from a parquet file? Are there any other options?

kurt_cobain · Answer 1 · Apr 20, 2018

As parquet is a column based storage file, so

val df = spark.read.parquet("fs://path/file.parquet"),load(<parquet>).select(...)

is the best option

answered Apr 20, 2018 by kurt_cobain
• 9,390 points

Efficient way to read specific columns from parquet file in spark

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Apache Spark

How to read a data from text file in Spark?

Copy file from local to hdfs from the spark job in yarn mode

How can I write a text file in HDFS not from an RDD, in Spark program?

where can i get spark-terasort.jar and not .scala file, to do spark terasort in windows.

Parquet File

Parquet Files Advantages

What do we exactly mean by “Hadoop” – the definition of Hadoop?

I installed Spark but while executing command, I am getting ‘hadoop’ command not found error?

How to stop messages from being displayed on spark console?

Concatenate columns in apache spark dataframe

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES