How can I import zip files and process the excel files ( inside the zip files ) by using pyspark connecting with pymongo ?

+2 votes
How can I import zip files and process the excel files ( inside the zip files ) by using pyspark connecting with pymongo ?

I was install spark and mongodb and python to process the files (excel, csv or json)

I used this code to connect pyspark with mmongo :

from pyspark.sql import SparkSession

my_spark = SparkSession \
    .builder \
    .appName("myApp") \
    .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.coll") \
    .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.coll") \
    .getOrCreate()

but then I was try to import zip files ( I don't need to open every files to process it )
Aug 6, 2019 in Apache Spark by Ahmed
60 views

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.

Related Questions In Apache Spark

0 votes
1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,380 points
758 views
0 votes
0 answers
0 votes
1 answer

How can we optimize and minimize the memory when work with scala use case?

Hi, There is a term in Scala that is ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 26,430 points
67 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Dhara dhruve
2,044 views
0 votes
1 answer

How can I minimize data transfers when working with Spark?

Minimizing data transfers and avoiding shuffling helps ...READ MORE

answered Sep 19, 2018 in Apache Spark by zombie
• 3,750 points
383 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,380 points
2,627 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
25,515 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,380 points
4,081 views
0 votes
1 answer
0 votes
0 answers

Struck to do an application on amazon price tracking

Sir I want to do an application ...READ MORE

4 hours ago in Selenium by Likhitha
• 120 points
19 views