Import zip files and process the excel files ( inside the zip files ) by using pyspark connecting with pymongo

+2 votes

How can I import zip files and process the excel files ( inside the zip files ) by using pyspark connecting with pymongo?

I was install spark and mongodb and python to process the files (excel, csv or json)

I used this code to connect pyspark with mmongo :

from pyspark.sql import SparkSession

my_spark = SparkSession \
    .builder \
    .appName("myApp") \
    .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.coll") \
    .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.coll") \
    .getOrCreate()

but then I was try to import zip files ( I don't need to open every files to process it )

Aug 10, 2019 in Python by Ahmed
• 310 points
123 views

1 answer to this question.

0 votes

I found this sample code:

import zipfile
import io

def zip_extract(x):
    in_memory_data = io.BytesIO(x[1])
    file_obj = zipfile.ZipFile(in_memory_data, "r")
    files = [i for i in file_obj.namelist()]
    return dict(zip(files, [file_obj.open(file).read() for file in files]))

zips = sc.binaryFiles("dbfs:/mnt/vedant-demo/ONG/data/las_raw/D-Dfiles.zip")
files_data = zips.map(zip_extract)

Check if this works. Source: https://gist.github.com/vedantja/bd74d0ba7c350dd348af1f92eadd0e76

answered Aug 19, 2019 by Reshma

Related Questions In Python

0 votes
0 answers
0 votes
1 answer
0 votes
0 answers

How to save the import csv file to mongodb using pyspark (or python)?

I have this code, and I want ...READ MORE

Oct 9, 2019 in Python by Ahmed
• 310 points
491 views
0 votes
1 answer
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 6, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 778 views
0 votes
0 answers
0 votes
1 answer

How to import json file to mongodb using pyspark (or python)?

You can use the same format as ...READ MORE

answered Sep 9, 2019 in Python by Karan
307 views
0 votes
1 answer