Published on Oct 09,2014
Email Post

Hadoop is now one of the most trending topics in the IT world! Hadoop has matured massively and has proved to be useful for diverse projects within a short span of time and as a result there is a huge demand for Hadoop Administration skill. The Hadoop community is fast developing and plays a prominent role in its domain.

How has Hadoop Overcome the Limitations of Traditional Technologies?

In the earlier data analytics architecture, there were certain limitations like not being able to explore original high fidelity raw data, computable data not being scalable and premature data death. In a traditional system, failures occurred during data analytics process and this was a major limitation.

The image below depicts how Hadoop overcomes the limitations faced by the traditional technology.

Why Hadoop?

Hadoop Features

The following are the features that make Hadoop a popular choice when it comes to handling Big Data:

  • Reliable
  • Flexible
  • Economical
  • Scalable

Top Hadoop Users:

  • Amazon
  • eBay
  • LinkedIn
  • Morgan
  • Yahoo!
  • Facebook
  • IBM
  • Rackspace
  • Google

Job Trend:

Hadoop Admin Job Trend

How to Become a Hadoop Administrator?

The following skills are essential to become a Hadoop Admin:

  • Understanding of Capacity Planning.
  • Good troubleshooting skills,
  • Hadoop skills like HBase, Hive, Pig, Mahout, etc.
  • Capable of deploying Hadoop cluster, monitoring and scaling vital aspects of the cluster.
  • Good knowledge of Linux.
  • Expertise in open-source configuration management and deployment tools such as Puppet or Chef and Linux scripting.
  • understanding of Troubleshooting Core Java Applications is a plus.

More about skills required to become a Hadoop Administrator.

Hadoop Administrator Responsibilities:

Here are some of the tasks performed by Hadoop Administrators, every day:

  • Responsible for implementation and administration of Hadoop infrastructure.
  • Testing HDFS, Hive, Pig and MapReduce access for Applications.
  • Cluster maintenance tasks like Backup, Recovery, Upgrade, Patching.
  • Performance tuning and capacity planning for clusters.
  • Monitor Hadoop cluster and deploy security.

More about daily responsibilities of a Hadoop Administrator

Hadoop 1.X Vs Hadoop 2.X:

Hadoop 1.X Vs Hadoop 2.X:

Hadoop 2.X has the added feature, namely YARN (Yet Another Resource Negotiator). YARN takes out all the granular details managed by the Administrator and  makes the job of a Hadoop Administrator easy.

More about Hadoop 1.0 and Hadoop 2.0’s Core Components.

Hadoop 2.0 Cluster Architecture Federation:

Apache Hadoop 2.x consists of significant improvements over the previous stable release (Hadoop-1.x). Hadoop 2.0 has the concept of Federation, where multiple name nodes are present and are federated, i.e. independent from each other. The datanodes are present at the bottom. The datanodes are used as common storage point for blocks by all the Namenodes. Each datanode registers with all the Namenodes in the cluster. Datanodes transmit periodic heartbeats and block reports and handles commands from the Namenodes.

Hadoop 2.0 Cluster Architecture Federation

More about Hadoop 2.0 Cluster Architecture Federation

Hadoop 2.0 High Availability:

In Hadoop 1.0., the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode and if there was some kind of failure at this point, the cluster would be unavailable until the NameNode was either restarted.

Hadoop 2.0 High Availability

Hadoop Cluster Modes:

Hadoop can run in any of the following 3 modes:

  • Standalone or Local Mode – There are no daemons running and everything runs in a single JVM. This mode is suitable for running MapReduce programs during development, since it’s easy to test and debug.
  • Pseudo Distributed Mode – Hadoop daemon runs in a separate Java process on local machine, thus simulating a cluster on a small scale.
  • Fully Distributed Mode – Hadoop runs on a cluster of machines. This is the mode in which Hadoop is being used by industries for doing real-world data processing. Typically, one machine in the cluster is designated as the NameNode and another machine as the JobTracker, exclusively.

Got a question for us? Mention them in the comments section and we will get back to you. 

Hadoop Administration Training

Job Responsibilities of a Hadoop Administrator

Skills Required to Become a Hadoop Administrator

About Author
Published on Oct 09,2014

Share on

Browse Categories