What is the best functional language to do Hadoop Map-Reduce?

0 votes

I'm doing an assignment for a course, which requires me to implement a parallel MapReduce engine in a functional language and then use it solve certain simple problems.

Which functional language do you think I should use?

Here are my requirements:

  • Should be relatively easy to learn, since I have only about 2 weeks for this assignment.
  • Has existing MapReduce implementations which can be found on the web - my course does not forbid me from using open-sourced code or internet resources in general.
  • Should fit the problem, and be an overall worthwhile language to learn (a relatively popular language).

I am currently considering Haskell and Clojure, but both these languages are new to me - I have no idea if any of these languages are actually appropriate for the situation.

Sep 4, 2018 in Big Data Hadoop by Neha
• 6,280 points
51 views

1 answer to this question.

0 votes

down voteacceptedBoth Clojure and Haskell are definitely worth learning, for different reasons. If you get a chance, I would try both. I'd also suggest adding Scala to your list.

If you have to pick one, I would choose Clojure, for the following reasons:

  • It's a Lisp - everyone should learn a Lisp. See http://www.paulgraham.com/avg.html
  • It has a unique approach to concurrency - see http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
  • It's a JVM language, which makes it immediately useful from a practical perspective: the library & tool ecosystem on the JVM is extremely good, better than any other platform IMHO. If you want to do serious tech. work in the enterprise or startup space, it is very helpful to gain a good knowledge of the JVM. FWIW, Scala also falls into this category of "interesting JVM languages".

Also, Clojure makes parallel map-reduce very easy. Here's one to start with:

(reduce + (pmap inc (range 1000)))
=> 500500

Using ratherpmap than map is enough to give you a parallel mapping operation. There are also parallel reducers if you use Clojure 1.5, see the reducers framework for more details.

Apart from that, you can also use  Scalding, which is a Scala abstraction on top of Cascading to abstract low-level Hadoop details. It was developed at Twitter, and seems mature enough today so you can start actually using it without too much trouble.

Here is an example how you would do a Wordcount in Scalding:

package com.twitter.scalding.examples

import com.twitter.scalding._

class WordCountJob(args : Args) extends Job(args) {
  TextLine( args("input") )
    .flatMap('line -> 'word) { line : String => tokenize(line) }
    .groupBy('word) { _.size }
    .write( Tsv( args("output") ) )

  // Split a piece of text into individual words.
  def tokenize(text : String) : Array[String] = {
    // Lowercase each word and remove punctuation.
    text.toLowerCase.replaceAll("[^a-zA-Z0-9\\s]", "").split("\\s+")
  }
}

I think it's a good candidate since because it's using Scala it's not too far from regular Map/Reduce Java programs, and even if you don't know Scala it's not too hard to pick up.

answered Sep 4, 2018 by Frankie
• 9,810 points

Related Questions In Big Data Hadoop

0 votes
11 answers
0 votes
1 answer

Which is the most preferable language for Hadooop Map-Reduce programs?

MapReduce is a programming model to perform ...READ MORE

answered Aug 3, 2018 in Big Data Hadoop by Neha
• 6,280 points
67 views
+1 vote
1 answer

What is the technique to know the Default scheduler in hadoop?

Default scheduler in hadoop is JobQueueTaskScheduler, which is ...READ MORE

answered Oct 30, 2018 in Big Data Hadoop by Frankie
• 9,810 points
134 views
0 votes
1 answer

What is the difference between Map-side join and Reduce-side join?

Join is a clause that combines the records ...READ MORE

answered Dec 13, 2018 in Big Data Hadoop by Omkar
• 67,660 points
202 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
3,379 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,854 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,231 views
0 votes
1 answer
0 votes
1 answer

What is Modeling data in Hadoop and how to do it?

I suggest spending some time with Apache ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by Frankie
• 9,810 points
114 views