Operation to simulate a Hadoop Production Cluster

Question

I want to run Hadoop jobs on my development workstation for testing before I submit them to my production cluster. Which mode of operation in Hadoop allows to most closely simulate a production cluster while using a single machine?

Can anyone help?

nitinrawat895 · Answer 1 · Aug 9, 2018

In this case, what you can do is, you can run all the nodes in your production cluster as virtual machines on your development workstation.

In large-scale cloud infrastructures, there is another deployment pattern: local VMs on desktop systems or other development machines. This is a good tactic if your physical machines run windows and you need to bring up a Linux system running Hadoop, and/or you want to simulate the complexity of a small Hadoop cluster.

Have enough RAM for the VM to not swap. Don't try and run more than one VM per physical host, it will only make things slower. use file: URLs to access persistent input and output data. consider making the default filesystem a file: URL so that all storage is really on the physical host. It's often faster and preserves data better.

Hope it will answer your query to some extent.