Json and snappy compression

–1 vote

When trying to write json file using snappy compression the below method is not working.

sqlContext.setConf("spark.sql.json.compression.codec","snappy")
filterStatus.write.json("/user/hduser_212418/heorder_json")

what changes to be done for the above code for it to save in snappy compression format. Only the below one works.

filterStatus.toJSON.rdd.saveAsTextFile("/user/hduser_212418/heorder_json",classOf[org.apache.hadoop.io.compress.SnappyCodec])

Input to the abov is :

val filterStatus = rdFile.filter("order_status like "%Y%"")
filterStatus: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [order_id: int, order_date: bigint ... 2 more fields]
Jan 11, 2019 in Big Data Hadoop by digger
• 26,740 points
3,751 views

1 answer to this question.

0 votes

The issue you're facing arises because the Spark SQL configuration sqlContext.setConf("spark.sql.json.compression.codec", "snappy") is not recognized by the default DataFrame .write.json() operation for controlling the compression codec. The correct way to specify compression for DataFrame writes involves using option() settings on the write operation. Here's how you can modify your code:

filterStatus.write
  .option("compression", "snappy")
  .json("/user/hduser_212418/heorder_json")

Explanation:

  • Instead of setting the codec at the sqlContext level, the option("compression", "snappy") method ensures that the JSON write operation uses Snappy compression.
  • This method is simpler and fits directly into the .write.json() logic.

Why the Original toJSON.rdd.saveAsTextFile() Works:

  • The filterStatus.toJSON.rdd.saveAsTextFile() approach works because it explicitly transforms the DataFrame to an RDD of JSON strings, and then the saveAsTextFile method uses Hadoop's SnappyCodec for compression.

Key Differences:

  • The .write.json() with .option("compression", "snappy") approach is more idiomatic for Spark's DataFrame API.
  • toJSON.rdd.saveAsTextFile() is lower-level and converts the data to an RDD of strings before saving. It offers more control but can be less optimized compared to the built-in DataFrame write methods.
answered Jan 11, 2019 by Omkar
• 69,220 points

Related Questions In Big Data Hadoop

0 votes
10 answers

What is the difference between Mongodb and Hadoop?

MongoDB is a NoSQL database, whereas Hadoop is ...READ MORE

answered Jun 20, 2018 in Big Data Hadoop by jenny_code
12,168 views
0 votes
1 answer

How can I download only hdfs and not hadoop?

No, you cannot download HDFS alone because ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,565 views
+2 votes
10 answers

Is there any difference between “hdfs dfs” and “hadoop fs” shell commands?

hadoop fs <args> fs is used for generic ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by anonymous
34,188 views
0 votes
1 answer

How to install and configure a multi-node Hadoop cluster?

I would recommend you to install Cent ...READ MORE

answered Mar 22, 2018 in Big Data Hadoop by Shubham
• 13,490 points
2,455 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,076 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,573 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,071 views
0 votes
5 answers

Hadoop hdfs: list all files in a directory and its subdirectories

Hi, You can try this command: hadoop fs -ls ...READ MORE

answered Aug 1, 2019 in Big Data Hadoop by Dinish
18,412 views
0 votes
1 answer

Hadoop: What is the difference between `hadoop dfs` and `hadoop fs`?

You can find the definition here: $HADOOP_HOME/bin/hadoop ... elif [ ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 69,220 points
1,853 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP