How can I import zip files and process the excel files inside the zip files by using pyspark connecting with pymongo

+2 votes
How can I import zip files and process the excel files ( inside the zip files ) by using pyspark connecting with pymongo ?

I was install spark and mongodb and python to process the files (excel, csv or json)

I used this code to connect pyspark with mmongo :

from pyspark.sql import SparkSession

my_spark = SparkSession \
    .builder \
    .appName("myApp") \
    .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.coll") \
    .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.coll") \
    .getOrCreate()

but then I was try to import zip files ( I don't need to open every files to process it )
Aug 6, 2019 in Apache Spark by Ahmed
145 views

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.

Related Questions In Apache Spark

0 votes
1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,480 points
1,777 views
0 votes
0 answers

How can we optimize and minimize the memory when work with scala use case?

When we calculate some use case with ...READ MORE

Jul 5, 2019 in Apache Spark by nilam
157 views
0 votes
1 answer

How can we optimize and minimize the memory when work with scala use case?

Hi, There is a term in Scala that is ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 65,870 points
184 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Dhara dhruve
3,624 views
0 votes
1 answer

How can I minimize data transfers when working with Spark?

Minimizing data transfers and avoiding shuffling helps ...READ MORE

answered Sep 19, 2018 in Apache Spark by zombie
• 3,790 points
919 views
0 votes
1 answer

how can I get all executors' pending jobs and stages of particular sparksession?

Hi@Neha, You can find all the job status ...READ MORE

answered Aug 19, 2020 in Apache Spark by MD
• 95,180 points
181 views
0 votes
1 answer
0 votes
0 answers

Struck to do an application on amazon price tracking

Sir I want to do an application ...READ MORE

Apr 7, 2020 in Selenium by Likhitha
• 120 points
156 views
0 votes
1 answer
0 votes
1 answer

I've been trying to run this code, but the error says "Expected an indented block" for the line, " word_as_list[index] = guess."

Hi, @Paradox, The error message IndentationError: expected an indented ...READ MORE

answered Nov 20, 2020 in Python by anonymous
• 65,870 points
179 views