Where Big Data tools like Hadoop and Spark comes into picture when we talk about ETL?

0 votes
I am working on Hadoop from last 4 months. So now Im very curious about to know that where ETL tools are used in case of Big Data tools like hadoop and Spark and for what purpose?
May 3, 2018 in Big Data Hadoop by Shubham
• 13,290 points
20 views

1 answer to this question.

0 votes

When we talk about ETL, ETL means extract, transform & load (ETL)

A typical ETL pipeline consists of a data source, followed by a transformation, used for filtering or cleaning data, ending in a data sink.

image
So in case of Hadoop and Spark an ETL flow can be defined as:

Data is coming from various sources such as databases, Kafka, Twitter, etc.

To get some meaningful insights we need to filter out or clean the data using spark, mapreduce, hive, pig, etc.

Finally after processing(transformation) the data, it is stored in a data sink such as HDFS, table, etc.

Hope this will help you.

answered May 3, 2018 by nitinrawat895
• 10,670 points

Related Questions In Big Data Hadoop

0 votes
1 answer

What is the Data format and database choices in Hadoop and Spark?

Use Parquet. I'm not sure about CSV ...READ MORE

answered Sep 4, 2018 in Big Data Hadoop by Frankie
• 9,810 points
62 views
0 votes
1 answer
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7 in Big Data Hadoop by pradeep
164 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
989 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,730 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,020 points
83 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,518 views
0 votes
1 answer

HortonWorks Hadoop encryption tools and data security

There are many tools available for encrypting ...READ MORE

answered Apr 20, 2018 in Big Data Hadoop by nitinrawat895
• 10,670 points
110 views
0 votes
1 answer

Is Kafka and Zookeeper are required in a Big Data Cluster?

Apache Kafka is one of the components ...READ MORE

answered Mar 22, 2018 in Big Data Hadoop by nitinrawat895
• 10,670 points
381 views