how to run spark job from EC2 to EMR?

0 votes
I have created a django application in EC2 instance and for analytics I used pyspark in django application. Now for spark jobs i want to utilize EMR. How it can be done?
Jun 24 in Apache Spark by Edureka
• 120 points
88 views

1 answer to this question.

0 votes

Hi,

You can follow the below-given steps to run your spark code in the EMR cluster.

  • Upload files on Amazon S3.

  • Open the Amazon EMR console.

  • Choose Create cluster.

  • On the General Configuration section, enter the cluster name, choose the S3 bucket you created (the logs will be stored in this bucket), and check Step execution.

  • On the Add steps section, select Spark application, click Configure, and fill the popup like this.

  • On the Software Configuration section, use the default release.

  • On the Hardware configuration section, choose the instance type and the number of instances.

  • On the Security and access section, use the default values.

  • Click on Create cluster

  • Now go back to the S3 console and you will see the output directory in which the result has been stored, you can click on it and download its contents

answered Jun 25 by MD
• 40,740 points

Related Questions In Apache Spark

0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,310 points
2,762 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,380 points
5,521 views
0 votes
1 answer

How to run spark in Standalone client mode?

Hi, These are the steps to run spark in ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 32,770 points
316 views
0 votes
1 answer

Copy file from local to hdfs from the spark job in yarn mode

Refer to the below code: import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem import ...READ MORE

answered Jul 24, 2019 in Apache Spark by Yogi
1,154 views
0 votes
1 answer

Amazon DynamoDB: warning

Yes, you should address this problem. You can ...READ MORE

answered Aug 30, 2018 in AWS by Priyaj
• 57,550 points
64 views
0 votes
1 answer

How to launch and configure an EMR cluster using boto

Boto and the underlying EMR API is ...READ MORE

answered Sep 12, 2018 in AWS by Priyaj
• 57,550 points
2,448 views
0 votes
1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

answered May 14 in Apache Spark by MD
• 40,740 points
184 views