How to open/stream .zip files through Spark?

0 votes
I have zip files that I would like to open 'through' Spark. I can open .gzip file no problem because of Hadoops native Codec support, but am unable to do so with .zip files.

Is there an easy way to read a zip file in your Spark code? I've also searched for zip codec implementations to add to the CompressionCodecFactory, but am unsuccessful so far.
Nov 20, 2018 in Apache Spark by Neha
• 6,180 points

recategorized Nov 20, 2018 by Neha 185 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You can try and check this below code:

using API sparkContext.newAPIHadoopRDD(
    hadoopConf,
    InputFormat.class,
    ImmutableBytesWritable.class, Result.class)
answered Nov 20, 2018 by Frankie
• 9,710 points

Related Questions In Apache Spark

0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,260 points
723 views
0 votes
1 answer

How to get Spark dataset metadata?

There are a bunch of functions that ...READ MORE

answered Apr 26, 2018 in Apache Spark by kurt_cobain
• 9,260 points
225 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

answered Dec 31, 2018 in Apache Spark by anonymous
3,632 views
0 votes
1 answer
0 votes
0 answers
0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 2 in Big Data Hadoop by ravikiran
• 2,200 points
27 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
134 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,390 points
1,837 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,710 points
141 views
0 votes
1 answer

Filter, Option or FlatMap in spark

If, for option 2, you mean have ...READ MORE

answered Nov 9, 2018 in Apache Spark by Frankie
• 9,710 points
285 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.