Where can I find AMI for Hadoop on EC2

Question

I am trying to set up Hadoop permanently on Amazon EC2. Currently what I am doing is every morning launch EC2 instances and set up Hadoop. Is there any way i can avoid this tedious step? I am looking for an Hadoop image which can be loaded on EC2 and make things easy for me.

I know I can use EMR for hadoop services. But I dont know how to start a EMR (hadoop) cluster without submitting a job flow. I mean I need a hadoop cluster without any jobs running in it.

Ultimately my aim is to run bioinformatics applications like Distmap and Seal. For these applications to run there are many dependencies. So I need a free hadoop cluster to set up the environment and then run these applications. I hope its clear what I am trying to do.

Frankie · Answer 1 · Nov 12, 2018

To find that, please try with below choices

1. Start out with an EBS backed EC2 instance with your favorite Linux distro. Go ahead and install Hadoop software that you need. Create as many EC2 instances as the types of instances you are going to need (master / slaves /etc). You can create then your own AMIs in the AWS Console (right click on the EC2 instance and click "Create AMI"). You can then launch your own instances, as many as you need, based on this AMI. You can also create AMI's from instance-store backed instances, but that will mean dumping everything to S3 and creating an AMI from there. There are a lot of tutorials about this available, please leave a comment if you need directions :)

2. Start out with a Hadoop based AMI, repeat the steps above after doing your own configurations / adding dependencies to them. I went ahead and searched for Hadoop AMI's from the AWS console and there are 48 in EU-West-1 (not sure what region you're working with).

3. Start an EMR Cluster in interactive mode. There is also an option to keep the cluster alive after finishing job flows. If you also set the EC2 keys for the EMR instances, you should be able to SSH into them and have a functional Hadoop cluster.