Why is Spark faster than Hadoop Map Reduce

Question

Can anyone explain, why certain programs are faster in Spark that in MapReduce?

shams · Answer 1 · Apr 30, 2018

Firstly, it's the In-memory computation, if the file is present in HDFS, it takes more time to load data from HDFS, do the processing and store the result back to HDFS (in case there are multiple MR jobs). For Spark, data is stored in the cache memory, and when the final transformation is done (action), only then it is stored in HDFS. This saves a lot of time.

Spark uses lazy evaluation with the help of DAG (Directed Acyclic Graph) of consecutive transformations. This reduces data shuffling and the execution is optimized.

Lastly, Spark has its own SQL, Machine Learning, Graph and Streaming components unlike Hadoop, where you have to install all the other frameworks separately and data movement between these frameworks is a nasty job.

Hope it helps.