Where can I find AMI for Hadoop on EC2

0 votes

I am trying to set up Hadoop permanently on Amazon EC2. Currently what I am doing is every morning launch EC2 instances and set up Hadoop. Is there any way i can avoid this tedious step? I am looking for an Hadoop image which can be loaded on EC2 and make things easy for me.

I know I can use EMR for hadoop services. But I dont know how to start a EMR (hadoop) cluster without submitting a job flow. I mean I need a hadoop cluster without any jobs running in it.

Ultimately my aim is to run bioinformatics applications like Distmap and Seal. For these applications to run there are many dependencies. So I need a free hadoop cluster to set up the environment and then run these applications. I hope its clear what I am trying to do.

Nov 12, 2018 in Big Data Hadoop by Neha
• 6,300 points
838 views

1 answer to this question.

0 votes
To find that, please try with below choices

 1. Start out with an EBS backed EC2 instance with your favorite Linux distro. Go ahead and install Hadoop software that you need. Create as many EC2 instances as the types of instances you are going to need (master / slaves /etc). You can create then your own AMIs in the AWS Console (right click on the EC2 instance and click "Create AMI"). You can then launch your own instances, as many as you need, based on this AMI. You can also create AMI's from instance-store backed instances, but that will mean dumping everything to S3 and creating an AMI from there. There are a lot of tutorials about this available, please leave a comment if you need directions :)

 2. Start out with a Hadoop based AMI, repeat the steps above after doing your own configurations / adding dependencies to them. I went ahead and searched for Hadoop AMI's from the AWS console and there are 48 in EU-West-1 (not sure what region you're working with).

3. Start an EMR Cluster in interactive mode. There is also an option to keep the cluster alive after finishing job flows. If you also set the EC2 keys for the EMR instances, you should be able to SSH into them and have a functional Hadoop cluster.
answered Nov 12, 2018 by Frankie
• 9,830 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Where can I find hadoop-streaming.jar JAR file?

You will find the streaming jar here: ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,080 points
6,445 views
0 votes
1 answer

Where can I find logs in Spark on YARN?

You can access logs through the command yarn ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 69,210 points
842 views
0 votes
1 answer

How can I download hadoop documentation for a specific version?

You can go through this SVN link:- ...READ MORE

answered Mar 22, 2018 in Big Data Hadoop by Shubham
• 13,490 points
607 views
0 votes
1 answer

In Hadoop MapReduce, how can i set an Object as the Value for Map output?

Try this and see if it works: public ...READ MORE

answered Nov 21, 2018 in Big Data Hadoop by Omkar
• 69,210 points
749 views
0 votes
1 answer

Using Shapely on AWS Lambda with Python 3

For some reason, the pip install of ...READ MORE

answered Oct 8, 2018 in AWS by Priyaj
• 58,090 points
2,563 views
0 votes
1 answer
0 votes
1 answer

Where can I find older versions of Hadoop?

You can check here. From the archives. In particular, ...READ MORE

answered Sep 27, 2018 in Big Data Hadoop by Frankie
• 9,830 points
473 views
+1 vote
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP