Why is Spark map output compressed

0 votes
I am using Spark for MapReduce and I see that the output file after map phase is always compressed. Why is this happening?
Feb 24, 2019 in Apache Spark by Sanam
1,100 views

1 answer to this question.

0 votes

Spark thinks that it is a good idea to compress output files and it is in fact right. The reason for the compression of output files is due to the property spark.shuffle.compress. This property is used to decide whether the output file should be compressed or not and by default is set to true. If you do not want the output to be changed then you can change this property dynamically:

./bin/spark-submit --conf spark.shuffle.compress=false
answered Feb 24, 2019 by Wasim

Related Questions In Apache Spark

0 votes
1 answer

Why is Spark faster than Hadoop Map Reduce

Firstly, it's the In-memory computation, if the file ...READ MORE

answered Apr 30, 2018 in Apache Spark by shams
• 3,670 points
1,396 views
0 votes
1 answer

What is Map and flatMap in Spark?

Hi, The map is a specific line or ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 65,770 points
2,103 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
8,615 views
0 votes
1 answer

Why is collect in SparkR slow?

It's not the collect() that is slow. ...READ MORE

answered May 3, 2018 in Apache Spark by Data_Nerd
• 2,390 points
2,973 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,076 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,574 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,071 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,641 views
0 votes
1 answer

Not able to preserve shuffle files in Spark

You lose the files because by default, ...READ MORE

answered Feb 24, 2019 in Apache Spark by Rana
1,499 views
0 votes
1 answer

Spark SQL in databricks

In sparkSql, we can use CASE when ...READ MORE

answered Feb 24, 2019 in Apache Spark by Rishi
2,288 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP