Which one is better MR or Spark

Question

There is a Mapper only job which takes data from a source and with minimal processing it puts in HDFS. Will the same job give better performance in spark? If it depends on the size of source data please explain to me about things where spark is better and where map reduce is better.

score 0 · Answer 1 · Jul 15, 2019

The above difference clearly points out that Apache Spark is way better than Hadoop MapReduce or in other words, more suitable for the real-time analytics. However it would be interesting to know that what makes Spark better than MapReduce. But before that you should what exactly these technologies are. Read below-

MapReduce is a methodology for processing huge amounts of data in a parallel and distributed setting. The two tasks that are undertaken in the MapReduce programming are the Mapper and the Reducer. Mapper takes up the job of sorting the data that is available and the Reducer is entrusted with the task of combining the data and converting it into smaller chunks. MapReduce along with HDFS and YARN are the three important components of Hadoop systems.Spark is a new and rapidly growing open source technology that works very well on cluster of computer nodes. Speed is one of the hallmarks of Apache Spark. The developers working in this environment get an application programming interface that is based on the framework of RDD (Resilient Distributed Dataset). RDD is nothing but the abstraction provided by Spark that lets you segregate nodes into smaller divisions on the cluster in order to independently process the data.