Import zip files and process the excel files inside the zip files by using pyspark connecting with pymongo

+2 votes

How can I import zip files and process the excel files ( inside the zip files ) by using pyspark connecting with pymongo?

I was install spark and mongodb and python to process the files (excel, csv or json)

I used this code to connect pyspark with mmongo :

from pyspark.sql import SparkSession

my_spark = SparkSession \
    .builder \
    .appName("myApp") \
    .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.coll") \
    .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.coll") \
    .getOrCreate()

but then I was try to import zip files ( I don't need to open every files to process it )

To know more about Pyspark, it's recommended that you join Pyspark Course today.

Aug 10, 2019 in Python by Ahmed
• 310 points
2,118 views

1 answer to this question.

0 votes

I found this sample code:

import zipfile
import io

def zip_extract(x):
    in_memory_data = io.BytesIO(x[1])
    file_obj = zipfile.ZipFile(in_memory_data, "r")
    files = [i for i in file_obj.namelist()]
    return dict(zip(files, [file_obj.open(file).read() for file in files]))

zips = sc.binaryFiles("dbfs:/mnt/vedant-demo/ONG/data/las_raw/D-Dfiles.zip")
files_data = zips.map(zip_extract)

Check if this works. Source: https://gist.github.com/vedantja/bd74d0ba7c350dd348af1f92eadd0e76

Hope this helps!

To know more about Pyspark, it's recommended that you join Pyspark course online.

Thanks.

answered Aug 19, 2019 by Reshma

Related Questions In Python

0 votes
0 answers
0 votes
1 answer
0 votes
0 answers

How to save the import csv file to mongodb using pyspark (or python)?

I have this code, and I want ...READ MORE

Oct 9, 2019 in Python by Ahmed
• 310 points
2,099 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,051 views
0 votes
1 answer
0 votes
1 answer

How to sort json file ascending in pymongo?

Hey Ahmad, you can get the output ...READ MORE

answered Aug 14, 2019 in Python by Trisha
781 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP