When I was younger I used to love the Star Wars movie series. I loved it so much that I had these video cassette of each and every one of them. I used to watch them over and over again. Then came the time of DVD players and I had to get these movies again in DVD. I had no qualms about buying the entire set again in DVD as I wanted to experience the new technology and continue to enjoy my favorite movies as well. Things were great until another new technology, Blu-ray emerged. Having bought the movies twice already, I wasn’t looking forward to buying them all over again. For some time I didn’t make any effort to ‘update’ myself, thinking that I would do just fine without this technology. And I did fine. It didn’t affect my life anyway. But I could see a change in trend and couldn’t share my movies with my friends, like they were doing with each other. I really felt left out.
I finally did go out and get the movie in Blu-ray. I remember it not because I love the movie, but because of the fact that the quality of the video was mind blowing. And to top it all, I got the entire set of movies in a single Blu-ray Disc.
The need or the urge to stay updated is strongest when it comes to our profession as the risks are greater. It has become essential to stay on top of your game.
4 Practical Reasons for Learning Hadoop 2.0:
As unavoidable it may be, staying up-to-date in our profession has become an important part of our lives. As daunting as it may sound there is no need to get alarmed as technologies don’t really change that fast or drastically. But the talks and information about these technologies and the things we can do with them are gaining visibility. Here are some reasons why you should stay up-to-date:
#1: Don’t Get Caught Out
Not paying attention to latest update in a technology can make you look like a deer caught in a headlight. Not exactly a beaming picture of your professional capabilities. Being up-to-date will get you respected by your peers for your professional skills. There might not be a need to implement every new thing you learn but being conscious of the updates is imperative.
For example, when there is a talk on Hadoop, you can let your peers know that the Hadoop 2.5.0 has authentication improvements when using an HTTP proxy server. And also in the very same version of Hadoop, there is a provision for writing directly to Graphite.
It becomes essential to be on top of the latest updates when organizations are thinking of migrating to Hadoop. And ‘knowing’ can make a huge difference to your career.
#2: Having a Competitive Edge
Professionals who are skilled in their fields are respected. And staying up-to-date is the best way to be on top. Your need to stay updated reflects your passion towards your job. By developing expertise in your job and your industry, you’ll earn the trust and respect of the people around you. From a leadership perspective, this is invaluable!
Even if your organization is still working with Hadoop 1.0, knowing all the latest features of Hadoop 2 will keep you on track as it is relatively new and definitely better. Being the first one to learn this would give you an edge over your peers.
#3: New Opportunities
It’s a sad reality that our current role keep changing. With time comes added responsibilities and opportunities to do new tasks. By staying up-to-date on industry trends you’re in the best position to seize these opportunities.
Companies like Macy’s , Lockheed Martin, California Creative Solutions, Capital One, CSpring, CACI International Inc., Oracle, Yahoo!, American Express, BlueHawk, Aetna, Lawrence Livermore National Laboratory and many more are looking for people skilled in the latest features of Hadoop 2, like YARN.
#4: Make Better Decisions
The extra information will allow you to make informed choices and better decisions. It will help you to recognize opportunities and add value to your organization’s strategy.
Hadoop 2 has features that enhance speed as well as cut down on the cost. Suggesting options to improve performance and increase speed can definitely boost the organization’s productivity. Here are some features of Hadoop 2 that will benefit the organization and suggesting them will boost your career as well.
Support for running Hadoop on Microsoft Windows
Simplified distribution of MapReduce binaries via HDFS in YARN Distributed Cache.
Enhanced support for new applications on YARN with Application History Server and Application Timeline Server
Complete HTTPS support in HDFS
Kerberos integration for YARN’s timeline store.
Support for Heterogeneous Storage hierarchy in HDFS.
In-memory cache for HDFS data with centralized administration and management.
Simplified distribution of MapReduce binaries via HDFS in YARN Distributed Cache.
There are more than just practical reasons for staying updated; There are technical reasons as well. Hadoop has numerous features that are advantageous to the organizations. Taking an in-depth look at them will give you clear picture of what the advantageous are.
What’s the latest update in Hadoop?
Every product goes through various stages of releases and come up with various versions of itself. Hadoop is no exception and has come up with Hadoop 2.0. The Apache foundation has come up with subsequent versions of Hadoop like Hadoop 2.1.0 , Hadoop 2.4.0 and has reached Hadoop 2.5.1, which is the latest version released in September 2014.
Why Hadoop 2 was released?
With new version comes added features and fixed bugs. So every time you use a particular version of Hadoop and think that a certain feature can be added or some bugs need to be fixed, you let the guys at Apache foundation know about it. These guys in turn work on it and give you a better product in the next version.
Hadoop 2 – Not Just a Number
Hadoop 2 is not just the latest version of Hadoop. By and large, it is a second-generation architecture. Arun Murthy, founder and architect at Hadoop distributor Hortonworks, insists that the distinction is important because the amount of re-engineering required to move Hadoop beyond batch processing and into the world of real-time analytics has been substantial.
Let’s discuss how different Hadoop 2.0 is from its predecessor Hadoop 1.0. Obviously, the later released version is going to be superior than the earlier release. The following are the four major improvements in Hadoop 2.0 over Hadoop 1.x:
HDFS Federation – Horizontal scalability of NameNode
NameNode High Availability – NameNode is no longer a Single Point of Failure
YARN – Ability to process Terabytes and Petabytes of data available in HDFS using Non-MapReduce applications such as MPI, GIRAPH
Resource Manager – Splits up the two major functionalities of overburdened JobTracker (resource management and job scheduling/monitoring) into two separate daemons: a global Resource Manager and per-application ApplicationMaster
There are additional features such as Capacity Scheduler (Enable Multi-tenancy support in Hadoop), Data Snapshot, Support for Windows, NFS access, enabling increased Hadoop adoption in the Industry to solve Big Data problems.
Hadoop 2.X Vs Hadoop 1.X
Let’s do a small comparison and see in what ways Hadoop 2.0 is better and different from Hadoop 1.0
Why is Hadoop 2 preferred over Hadoop 1.0?
Hadoop 2.0 offers performance improvements that benefits related technologies in the Hadoop ecosystem. Besides the groundbreaking features of HDFS and the second generation architecture (YARN), there are even more and greater reasons for preferring Hadoop 2 over Hadoop 1.0 :
Hadoop 2 no longer has language restriction. Meaning, a wide range of professionals can now use Hadoop.
With Hadoop 2, obstacles like shortage of MapReduce coders are overcome.
2 times faster than Hadoop 1.0
2 times the ROI with existing hardware.
With YARN, the application-programming interface is much more open and flexible.
Hadoop 2 expands the possibilities for using Hadoop in Big Data projects.
With Hadoop 2, developers can now perform a huge variety of data-crunching tasks, beyond Hadoop’s previous scope of batch processing.
Offers new opportunities for information managers and addresses shortcomings in previous versions.
This new release has the unique feature of running multiple workloads on the same Hadoop cluster.
Hadoop is no longer restricted to one feature. Its application now extends beyond HDFS and MapReduce.
Key Benefits of YARN
We are aware of YARN being a second generation architecture, let’s see what makes it so great.
New Programming models and services
Enhanced cluster usage
Much more than Java
And many more
Demand for Hadoop 2 Skills
Organizations are now launching or experimenting with Hadoop 2. Consequently, there comes a need for professionals skilled in Hadoop 2. Many organizations have already begun looking internally for people to work with Hadoop. There are clear indications that YARN is on the rise and will eventually supersede the demand for MapReduce skill.
Here are some views on the current and projected demand for Hadoop skills:
According to analysts from Gartner, Hadoop 2 is a vital development as big enterprises around the globe have found Hadoop to be a game changer in their Big Data management.
According to Eric Kavanagh, from Bloor group, Hadoop 2.0 has gained traction among information workers seeking to wrangle Big Data.
Hadoop 2.0 adoption continues to be on the rise and is now entering the stage of maturity.
Organizations are aware of the benefits of YARN and are excited about it.
Here’s a snapshot of job openings for Hadoop 2.0/YARN in Indeed.com
Who is moving to Hadoop 2 or already has?
Yahoo! , the leader in all thing Hadoop has implemented YARN (0.23.x). According to Murthy, Yahoo’s 35,000-node cluster now processes 130-150 jobs per day compared to 50-60 prior to YARN.
When talking about the stellar performance, Murthy quotes, “When you’ve got 2x over 35,000 to 40,000 nodes, that’s phenomenal”. He also added, “It’s a pretty compelling story to tell a CIO that if you just upgrade your software from Hadoop 1 to Hadoop 2, you’ll see 2 times throughput improvements in your jobs.”
eBay has one of the largest Hadoop clusters in the industry, where the data is in petabytes. They have also migrated their clusters to Hadoop 2.
With Hadoop 2 being mature and easier to implement, they are even convincing the skeptics and more and more organizations are migrating to Hadoop 2.0. There are valid reasons to avoid the 1.x versions. But with Hadoop 2, even the unbelievers are considering it as it can be used for a wide range of uses. Learning Hadoop 2.0 and implementing it to perform computations on Big Data, you will be opening the gates to a technically advanced and financially rewarding career.
Got a question for us? Please mention them in the comments section and we will get back to you.