MapReduce is an algorithm used to store data in HDFS. The MapReduce is divided into two important tasks, Map and Reduce. The Mapper takes a set of data and converts it into another set of data, in such a way that individual elements are stored as key/value pairs. In reduce task, the output from a map is taken as input and and the kep/value pair are combined into a smaller set of key/value pair.
YARN is used to split up the functionalities of resource management and job scheduling/monitoring into separate daemons.
The ResourceManager assigns resources among all the applications in the system. The NodeManager is responsible for containers, monitoring their resource usage (CPU, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.