how to run spark job from EC2 to EMR

0 votes
I have created a django application in EC2 instance and for analytics I used pyspark in django application. Now for spark jobs i want to utilize EMR. How it can be done?
Jun 25, 2020 in Apache Spark by Edureka
• 120 points
2,439 views

1 answer to this question.

0 votes

Hi,

You can follow the below-given steps to run your spark code in the Amazon EMR cluster.

  • Upload files on Amazon S3.

  • Open the Amazon EMR console.

  • Choose Create cluster.

  • On the General Configuration section, enter the cluster name, choose the S3 bucket you created (the logs will be stored in this bucket), and check Step execution.

  • On the Add steps section, select Spark application, click Configure, and fill the popup like this.

  • On the Software Configuration section, use the default release.

  • On the Hardware configuration section, choose the instance type and the number of instances.

  • On the Security and access section, use the default values.

  • Click on Create cluster

  • Now go back to the S3 console and you will see the output directory in which the result has been stored, you can click on it and download its contents

answered Jun 25, 2020 by MD
• 95,460 points

Related Questions In Apache Spark

0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,390 points
5,472 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
13,450 views
0 votes
1 answer

How to run spark in Standalone client mode?

Hi, These are the steps to run spark in ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 65,890 points
1,712 views
0 votes
1 answer

Copy file from local to hdfs from the spark job in yarn mode

Refer to the below code: import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem import ...READ MORE

answered Jul 24, 2019 in Apache Spark by Yogi
3,718 views
0 votes
1 answer

Amazon DynamoDB: warning

Yes, you should address this problem. You can ...READ MORE

answered Aug 30, 2018 in AWS by Priyaj
• 58,100 points
553 views
0 votes
1 answer

How to launch and configure an EMR cluster using boto

Boto and the underlying EMR API is ...READ MORE

answered Sep 12, 2018 in AWS by Priyaj
• 58,100 points
4,829 views
+1 vote
8 answers

How to replace null values in Spark DataFrame?

Hi, In Spark, fill() function of DataFrameNaFunctions class is used to replace ...READ MORE

answered Dec 15, 2020 in Apache Spark by MD
• 95,460 points
75,229 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 88,537 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP