how to run spark job from EC2 to EMR

0 votes
I have created a django application in EC2 instance and for analytics I used pyspark in django application. Now for spark jobs i want to utilize EMR. How it can be done?
Jun 24, 2020 in Apache Spark by Edureka
• 120 points

1 answer to this question.

0 votes


You can follow the below-given steps to run your spark code in the EMR cluster.

  • Upload files on Amazon S3.

  • Open the Amazon EMR console.

  • Choose Create cluster.

  • On the General Configuration section, enter the cluster name, choose the S3 bucket you created (the logs will be stored in this bucket), and check Step execution.

  • On the Add steps section, select Spark application, click Configure, and fill the popup like this.

  • On the Software Configuration section, use the default release.

  • On the Hardware configuration section, choose the instance type and the number of instances.

  • On the Security and access section, use the default values.

  • Click on Create cluster

  • Now go back to the S3 console and you will see the output directory in which the result has been stored, you can click on it and download its contents

answered Jun 25, 2020 by MD
• 95,220 points

Related Questions In Apache Spark

0 votes
1 answer

How to stop messages from being displayed on spark console?

In your file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,390 points
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,480 points
0 votes
1 answer

How to run spark in Standalone client mode?

Hi, These are the steps to run spark in ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 65,910 points
0 votes
1 answer

Copy file from local to hdfs from the spark job in yarn mode

Refer to the below code: import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem import ...READ MORE

answered Jul 24, 2019 in Apache Spark by Yogi
0 votes
1 answer

Amazon DynamoDB: warning

Yes, you should address this problem. You can ...READ MORE

answered Aug 30, 2018 in AWS by Priyaj
• 58,120 points
0 votes
1 answer

How to launch and configure an EMR cluster using boto

Boto and the underlying EMR API is ...READ MORE

answered Sep 12, 2018 in AWS by Priyaj
• 58,120 points
+1 vote
8 answers

How to replace null values in Spark DataFrame?

Hi, In Spark, fill() function of DataFrameNaFunctions class is used to replace ...READ MORE

answered Dec 15, 2020 in Apache Spark by MD
• 95,220 points
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 70,849 views