how to run spark job from EC2 to EMR?

0 votes
I have created a django application in EC2 instance and for analytics I used pyspark in django application. Now for spark jobs i want to utilize EMR. How it can be done?
Jun 24 in Apache Spark by Edureka
• 120 points
639 views

1 answer to this question.

0 votes

Hi,

You can follow the below-given steps to run your spark code in the EMR cluster.

  • Upload files on Amazon S3.

  • Open the Amazon EMR console.

  • Choose Create cluster.

  • On the General Configuration section, enter the cluster name, choose the S3 bucket you created (the logs will be stored in this bucket), and check Step execution.

  • On the Add steps section, select Spark application, click Configure, and fill the popup like this.

  • On the Software Configuration section, use the default release.

  • On the Hardware configuration section, choose the instance type and the number of instances.

  • On the Security and access section, use the default values.

  • Click on Create cluster

  • Now go back to the S3 console and you will see the output directory in which the result has been stored, you can click on it and download its contents

answered Jun 25 by MD
• 78,990 points

Related Questions In Apache Spark

0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,320 points
3,302 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,450 points
6,733 views
0 votes
1 answer

How to run spark in Standalone client mode?

Hi, These are the steps to run spark in ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 49,300 points
482 views
0 votes
1 answer

Copy file from local to hdfs from the spark job in yarn mode

Refer to the below code: import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem import ...READ MORE

answered Jul 24, 2019 in Apache Spark by Yogi
1,528 views
0 votes
1 answer

Amazon DynamoDB: warning

Yes, you should address this problem. You can ...READ MORE

answered Aug 30, 2018 in AWS by Priyaj
• 57,700 points
82 views
0 votes
1 answer

How to launch and configure an EMR cluster using boto

Boto and the underlying EMR API is ...READ MORE

answered Sep 12, 2018 in AWS by Priyaj
• 57,700 points
2,836 views
0 votes
1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

answered May 14 in Apache Spark by MD
• 78,990 points
604 views