Where Big Data tools like Hadoop and Spark comes into picture when we talk about ETL

Question

I am working on Hadoop from last 4 months. So now Im very curious about to know that where ETL tools are used in case of Big Data tools like hadoop and Spark and for what purpose?

nitinrawat895 · Answer 1 · May 3, 2018

When we talk about ETL, ETL means extract, transform & load (ETL)

A typical ETL pipeline consists of a data source, followed by a transformation, used for filtering or cleaning data, ending in a data sink.

So in case of Hadoop and Spark an ETL flow can be defined as:

Data is coming from various sources such as databases, Kafka, Twitter, etc.

To get some meaningful insights we need to filter out or clean the data using spark, mapreduce, hive, pig, etc.

Finally after processing(transformation) the data, it is stored in a data sink such as HDFS, table, etc.

Hope this will help you.

answered May 3, 2018 by nitinrawat895
• 11,380 points

Where Big Data tools like Hadoop and Spark comes into picture when we talk about ETL

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

What is the Data format and database choices in Hadoop and Spark?

What should be the choice of database and what type of data format is suitable for Spark/hadoop?

Sir if i have no knowledge of any programming language then it is good or bad for me but i have learned Big Data and Hadoop so i can get a job or not

Can we run Spark without using Hadoop?

Hadoop dfs -ls command?

Hadoop Mapreduce word count Program

How to get started with Hadoop?

hadoop fs -put command?

HortonWorks Hadoop encryption tools and data security

Is Kafka and Zookeeper are required in a Big Data Cluster?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES