Where Big Data tools like Hadoop and Spark comes into picture when we talk about ETL?

0 votes
I am working on Hadoop from last 4 months. So now Im very curious about to know that where ETL tools are used in case of Big Data tools like hadoop and Spark and for what purpose?
May 3, 2018 in Big Data Hadoop by Shubham
• 13,110 points
16 views

1 answer to this question.

0 votes

When we talk about ETL, ETL means extract, transform & load (ETL)

A typical ETL pipeline consists of a data source, followed by a transformation, used for filtering or cleaning data, ending in a data sink.

image
So in case of Hadoop and Spark an ETL flow can be defined as:

Data is coming from various sources such as databases, Kafka, Twitter, etc.

To get some meaningful insights we need to filter out or clean the data using spark, mapreduce, hive, pig, etc.

Finally after processing(transformation) the data, it is stored in a data sink such as HDFS, table, etc.

Hope this will help you.

answered May 3, 2018 by nitinrawat895
• 10,070 points

Related Questions In Big Data Hadoop

0 votes
1 answer

What is the Data format and database choices in Hadoop and Spark?

Use Parquet. I'm not sure about CSV ...READ MORE

answered Sep 4, 2018 in Big Data Hadoop by Frankie
• 9,810 points
48 views
0 votes
1 answer
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7 in Big Data Hadoop by pradeep
121 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
762 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,070 points
2,040 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,010 points
62 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
10,410 views
0 votes
1 answer

HortonWorks Hadoop encryption tools and data security

There are many tools available for encrypting ...READ MORE

answered Apr 20, 2018 in Big Data Hadoop by nitinrawat895
• 10,070 points
85 views
0 votes
1 answer

Is Kafka and Zookeeper are required in a Big Data Cluster?

Apache Kafka is one of the components ...READ MORE

answered Mar 22, 2018 in Big Data Hadoop by nitinrawat895
• 10,070 points
317 views