Hadoop Distribution Differences

0 votes

Can somebody outline the various differences between the various Hadoop Distributions available:

using the Apache Hadoop distro as a baseline.

Is there a good reason to using one of these distributions over the standard Apache Hadoop distro?

Feb 17 in Big Data Hadoop by Neha
• 6,280 points
23 views

1 answer to this question.

0 votes

The Yahoo distribution is a version of Hadoop 20 that they run (ran?) on some subset of their clusters. It includes a set of patches for stability, bug fixes, etc. It is a source release; it does not have admin-friendly features like rpm or debian packages, etc.

The Cloudera distribution is packages as rpms and debs (the source is also available). This means you can get updates via standard methods, etc. It also includes stability and bug fix patches. It is constantly maintained (not to say Yahoo's isn't -- I suppose one could just go on github and check when they last updated it). It also packages Pig and Hive.

Cloudera's distribution of Hadoop 20 is in beta, and 18 is considered stable (more on this on the Cloudera blog). The 18 version also includes packages for Hive and Pig; for 20, you have to build them yourself (there aren't official releases of Pig or Hive that support 20 yet, although patches exist). There may well be significant overlap between the Cloudera and Yahoo versions of 20; both provide manifests, so you can check. The latest documentation of Cloudera's distros is at http://archive.cloudera.com

Yahoo does not provide support for their distribution; they provide their patched version as a service to the community, so the folks who are interested can build what Yahoo runs internally. Given the size of Yahoo clusters, that's a significant contribution, especially if you aren't a Hadoop developer who follows the JIRAs all the time. Cloudera supports their distribution commercially, as well as providing some community support via the Hadoop mailing lists and, for distro-specific issues, on their GetSatisfaction page.

Both are pretty different from the vanilla Apache distro since they patch it in between releases (the cloudera version of 20 has 60+ patches!).

answered Feb 17 by Frankie
• 9,810 points

Related Questions In Big Data Hadoop

0 votes
1 answer

What is Hadoop Distribution ?

Some companies release or sell products that ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
31 views
0 votes
1 answer

Differences between Hadoop-common, Hadoop-core and Hadoop-client?

To help provide some additional details regarding ...READ MORE

answered Mar 28, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
428 views
0 votes
1 answer

How to find hadoop distribution and version?

Just Use the command Hadoop version ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points

edited Apr 6, 2018 by kurt_cobain 301 views
0 votes
1 answer

What is -cp command in hadoop? How it works?

/user/cloudera/data1 is not a directory, it is ...READ MORE

answered Oct 17, 2018 in Big Data Hadoop by Frankie
• 9,810 points
298 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,744 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
289 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,593 views
+1 vote
1 answer
0 votes
1 answer

How compression works in Hadoop?

It basically depends on the file type ...READ MORE

answered Jul 26, 2018 in Big Data Hadoop by Frankie
• 9,810 points
160 views