Differences are as follows:
- Hadoop's MR is capable of utilizing all cores, while MongoDB's is single threaded.
- Hadoop MR will not be collocated with the data, while Mongo DB's will be collocated.
- Hadoop MR has millions of engine/hours and can cope with many corner cases with massive size of output, data skews, etc
- There are higher level frameworks like Pig, Hive, Cascading built on top of the Hadoop MR engine.
- Hadoop MR is mainstream and a lot of community support is available.
From the above I can suggest the following criteria for selection:
Select Mongo DB MR if you need simple group by and filtering, do not expect heavy shuffling between map and reduce. In other words - something simple.
Select hadoop MR if you're going to do complicated, computationally intense MR jobs (for example some regressions calculations). Having a lot or unpredictable size of data between map and reduce also suggests Hadoop MR.
Java is a stronger language with more libraries, especially statistical. That should be taken into account.