What is the difference between a Big Data Warehouse and a traditional Data Warehouse?

0 votes

Usually, data warehouses in the context of big data are managed and implemented on the basis of the Hadoop-based system, like Apache Hive (right?). 
On the other hand, my question regards the methodological process. 
How do big data affect the design process of a data warehouse? 
Is the process similar or new tasks must be considered?

Aug 9, 2018 in Big Data Hadoop by Neha
• 6,280 points
61 views

1 answer to this question.

0 votes

Hadoop is similar in architecture to MPP data warehouses, but with some significant differences. Instead of rigidly defined by a parallel architecture, processors are loosely coupled across a Hadoop cluster and each can work on different data sources.

The data manipulation engine, data catalog, and storage engine can work independently of each other with Hadoop serving as a collection point. Also critical is that Hadoop can easily accommodate both structured and unstructured data. 

This makes it an ideal environment for iterative inquiry. Instead of having to define analytics outputs according to narrow constructs defined by the schema, business users can experiment to find what queries matter to them most. Relevant data can then be extracted and loaded into a data warehouse for fast queries.

The Hadoop ecosystem starts from the same aim of wanting to collect together as much interesting data as possible from different systems, but approaches it in a radically better way. 

With this approach, you dump all data of interest into a big data store (usually HDFS – Hadoop Distributed File System). This is often in cloud storage – cloud storage is good for the task, because it’s cheap and flexible, and because it puts the data close to cheap cloud computing power. You can still then do ETL and create a data warehouse using tools like Hive if you want, but more importantly you also still have all of the raw data available so you can also define new questions and do complex analyses over all of the raw historical data if you wish. 

The Hadoop toolset allows great flexibility and power of analysis, since it does big computation by splitting a task over large numbers of cheap commodity machines, letting you perform much more powerful, speculative, and rapid analyses than is possible in a traditional warehouse.

answered Aug 9, 2018 by Frankie
• 9,810 points

Related Questions In Big Data Hadoop

0 votes
1 answer
0 votes
2 answers
0 votes
10 answers

What is the difference between Mongodb and Hadoop?

Apart from the similarity that they are ...READ MORE

answered Dec 6, 2018 in Big Data Hadoop by Deeraj
2,170 views
0 votes
1 answer
0 votes
1 answer

What is the difference between local file system commands touch and touchz?

Actually they both do the same except touchz is ...READ MORE

answered Aug 14, 2018 in Big Data Hadoop by Frankie
• 9,810 points
126 views
0 votes
1 answer

How do I print hadoop properties in command line?

You can dump Hadoop config by running: $ ...READ MORE

answered Aug 23, 2018 in Big Data Hadoop by Frankie
• 9,810 points
96 views
0 votes
1 answer

What is Network Topology in Hadoop?

Let's imagine your cluster as a tree ...READ MORE

answered Sep 6, 2018 in Big Data Hadoop by Frankie
• 9,810 points
371 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,190 points
2,076 views
0 votes
1 answer

What is the difference between a Big Data Warehouse and a traditional Data Warehouse

Hadoop is similar in architecture to MPP data ...READ MORE

answered Aug 9, 2018 in Big Data Hadoop by Frankie
• 9,810 points
31 views
0 votes
1 answer

What is the difference between Big Data and Data Mining?

Big data and data mining are two ...READ MORE

answered Aug 23, 2018 in Big Data Hadoop by Frankie
• 9,810 points
23 views