How to open/stream .zip files through Spark?

0 votes
I have zip files that I would like to open 'through' Spark. I can open .gzip file no problem because of Hadoops native Codec support, but am unable to do so with .zip files.

Is there an easy way to read a zip file in your Spark code? I've also searched for zip codec implementations to add to the CompressionCodecFactory, but am unsuccessful so far.
Nov 20, 2018 in Apache Spark by Neha
• 6,280 points

recategorized Nov 20, 2018 by Neha 321 views

1 answer to this question.

0 votes

You can try and check this below code:

using API sparkContext.newAPIHadoopRDD(
    hadoopConf,
    InputFormat.class,
    ImmutableBytesWritable.class, Result.class)
answered Nov 20, 2018 by Frankie
• 9,810 points

Related Questions In Apache Spark

0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,047 views
0 votes
1 answer

How to get Spark dataset metadata?

There are a bunch of functions that ...READ MORE

answered Apr 26, 2018 in Apache Spark by kurt_cobain
• 9,240 points
359 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

answered Dec 31, 2018 in Apache Spark by anonymous
5,275 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 2 in Big Data Hadoop by ravikiran
• 4,560 points
62 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
185 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,648 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
321 views
0 votes
1 answer

Filter, Option or FlatMap in spark

If, for option 2, you mean have ...READ MORE

answered Nov 9, 2018 in Apache Spark by Frankie
• 9,810 points
430 views