There is so much info on Hadoop out there and we have talked about it so much in our blog as well. In spite of this elephantine (pun intended) volume of information, there still seems to be some misconceptions or rather a lack of clarity among some professionals and their counterparts, as to where it fits into the overall Big Data landscape.
Let’s go ahead and debunk some of the popular myths about Hadoop.
Myth #1: Hadoop is a Database
Hadoop is not a database nor a replacement for any database system. Hadoop is primarily a distributed file system and doesn’t contain database features like query optimization, indexing and random access to data. However, Hadoop can be used to build a database system.
Myth #2: Hadoop is a single solution
This is the biggest myth of all! Hadoop has a range of open source products like – HDFS (Hadoop Distributed File System), MapReduce, PIG, Hive, HBase, Ambari, Mahout, Flume and HCatalog. This is just the tip of the iceberg and there is more to it. So, basically Hadoop is an ecosystem.
Myth #3: Hadoop needs a bunch of programmers
This totally depends on what the organization plans to do. If the plan is to build a fancy Hadoop based Big Data suite, then programmers come into picture. If not, then programming should not be a worry at all, as most data integration tools have GUIs that abstract MapReduce programming complexity and pre-built templates.
Myth #4: Hadoop can only handle web analytics
When it comes to Hadoop, Web Analytics is highlighted as most of the companies use it for analyzing web logs and other web data. But, its application is not limited to web analytics alone. Hadoop is capable of handling a wider range of data and analytics appealing to broader range of organizations.
Myth #5: Big Data can do without Hadoop
When we say Big Data, then immediate thing that comes to mind is Hadoop, in-spite of other options available in the market. Therefore, when dealing with Big Data, there has to be Hadoop.
Myth #6: Hive resembles SQL
People who work on SQL can quickly catch up with Hive. Hive resembles SQL, but is not of SQL standard. Over the time, it is believed that Hadoop products will support standard SQL and SQL based vendor tools will support Hadoop.
Myth #7: Hadoop requires MapReduce
Hadoop and MapReduce are related, but they are not married to each other. Saying this, they are not mutually exclusive to each other. There are some variations of MapReduce that work with a variety of storage technologies that includes HDFS and some relational DBMSs. Some users opt to deploy HDFS with Hive or HBase, but not MapReduce.
Myth #8: MapReduce only controls analytics.
MapReduce handles parallel programming, fault tolerance of wide variety of coded logics and other applications, than just analytics.
Myth #9: Hadoop is cheap
This is the most common misconception with anything that is open source – either it is free or cheap. One needs to make wise decisions on their budget to make use of all the verticals of Hadoop. Just because it’s a free software doesn’t mean it cheap or free.