Parquet Files Advantages

What are the advantages of using Parquet Files?

Jun 21, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 2,908 views

2 answers to this question.

Parquet file is a columnar format file that helps to

Limit the I/O operations
Consumes less space
Fetches only required columns

Hope this helps.

answered Jun 21, 2018 by kurt_cobain
• 9,350 points

Parquet is a columnar format supported by many data processing systems. The benifits of having a columnar storage are -

1- Columnar storage limits IO operations.

2- Columnar storage can fetch specific columns that you need to access.

3-Columnar storage consumes less space.

4- Columnar storage gives better-summarized data and follows type-specific encoding.

answered Jul 4, 2018 by zombie
• 3,790 points

Related Questions In Apache Spark

0 votes

1 answer

Is it better to have one large parquet file or lots of smaller parquet files?

Ideally, you would use snappy compression (default) ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 14,518 views

0 votes

1 answer

How to open/stream .zip files through Spark?

You can try and check this below ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points • 3,148 views

0 votes

1 answer

Parquet to ORC format in Spark

I appreciate that you want to try ...READ MORE

answered Feb 15, 2019 in Apache Spark by Anjali
• 3,015 views

0 votes

1 answer

Not able to preserve shuffle files in Spark

You lose the files because by default, ...READ MORE

answered Feb 24, 2019 in Apache Spark by Rana
• 2,077 views

0 votes

1 answer

How to store files in executor's working directory?

You have to specify a comma-separated list ...READ MORE

answered Mar 28, 2019 in Apache Spark by Raj
• 5,015 views

0 votes

1 answer

Parquet File

Parquet is a columnar format file supported ...READ MORE

answered Jun 4, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 1,643 views

0 votes

1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
• 2,922 views

+1 vote

1 answer

I installed Spark but while executing command, I am getting ‘hadoop’ command not found error?

For accessing Hadoop commands & HDFS, you ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by Shubham
• 13,490 points • 3,622 views

0 votes

3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7, 2019 in Big Data Hadoop by pradeep
• 3,583 views

0 votes

1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 8,941 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP