Is fetching data from apache flume webcrawling

Question

Hi team,

I would like to understand while we are fetching data from social media websites using apache flume, isn't it same as web crawling?

score 0 · Answer 1 · Jul 11, 2019

Web crawling is a program or automated script which browses the websites mainly used to create a copy of all the visited pages for later processing by a search engine, that will index the downloaded pages to provide fast searches whereas Flume is a special-purpose tool designed to send data to HDFS and HBase. It has specific optimizations for HDFS and it integrates with Hadoop’s security.

Flume has a simple event-driven pipeline architecture with 3 important roles-Source, Channel and Sink.

-->Source defines where the data is coming from, for instance, a message queue or a file.

-->Sinks defined the destination of the data pipelined from various sources.

-->Channels are pipes which establish connections between sources and sinks.

where source can be any API or repositories such as the one provided by twitter, facebook, youtube, etc and sink is the place where you want to store the data such as HDFS/hive warehouse, etc.

The concept to both these technologies are quite similar but still, they are used for different purposes like flume only details all kinds of logs stored and web scrapping deals with scrapped website data.