What is the best functional language to do Hadoop Map-Reduce

0 votes

I'm doing an assignment for a course, which requires me to implement a parallel MapReduce engine in a functional language and then use it solve certain simple problems.

Which functional language do you think I should use?

Here are my requirements:

  • Should be relatively easy to learn, since I have only about 2 weeks for this assignment.
  • Has existing MapReduce implementations which can be found on the web - my course does not forbid me from using open-sourced code or internet resources in general.
  • Should fit the problem, and be an overall worthwhile language to learn (a relatively popular language).

I am currently considering Haskell and Clojure, but both these languages are new to me - I have no idea if any of these languages are actually appropriate for the situation.

Sep 4, 2018 in Big Data Hadoop by Neha
• 6,300 points
657 views

1 answer to this question.

0 votes

down voteacceptedBoth Clojure and Haskell are definitely worth learning, for different reasons. If you get a chance, I would try both. I'd also suggest adding Scala to your list.

If you have to pick one, I would choose Clojure, for the following reasons:

  • It's a Lisp - everyone should learn a Lisp. See http://www.paulgraham.com/avg.html
  • It has a unique approach to concurrency - see http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
  • It's a JVM language, which makes it immediately useful from a practical perspective: the library & tool ecosystem on the JVM is extremely good, better than any other platform IMHO. If you want to do serious tech. work in the enterprise or startup space, it is very helpful to gain a good knowledge of the JVM. FWIW, Scala also falls into this category of "interesting JVM languages".

Also, Clojure makes parallel map-reduce very easy. Here's one to start with:

(reduce + (pmap inc (range 1000)))
=> 500500

Using ratherpmap than map is enough to give you a parallel mapping operation. There are also parallel reducers if you use Clojure 1.5, see the reducers framework for more details.

Apart from that, you can also use  Scalding, which is a Scala abstraction on top of Cascading to abstract low-level Hadoop details. It was developed at Twitter, and seems mature enough today so you can start actually using it without too much trouble.

Here is an example how you would do a Wordcount in Scalding:

package com.twitter.scalding.examples

import com.twitter.scalding._

class WordCountJob(args : Args) extends Job(args) {
  TextLine( args("input") )
    .flatMap('line -> 'word) { line : String => tokenize(line) }
    .groupBy('word) { _.size }
    .write( Tsv( args("output") ) )

  // Split a piece of text into individual words.
  def tokenize(text : String) : Array[String] = {
    // Lowercase each word and remove punctuation.
    text.toLowerCase.replaceAll("[^a-zA-Z0-9\\s]", "").split("\\s+")
  }
}

I think it's a good candidate since because it's using Scala it's not too far from regular Map/Reduce Java programs, and even if you don't know Scala it's not too hard to pick up.

answered Sep 4, 2018 by Frankie
• 9,830 points

Related Questions In Big Data Hadoop

0 votes
1 answer

I have to ingest in hadoop cluster large number of files for testing , what is the best way to do it?

Hi@sonali, It depends on what kind of testing ...READ MORE

answered Jul 8, 2020 in Big Data Hadoop by MD
• 95,440 points
944 views
0 votes
11 answers
0 votes
1 answer

Which is the most preferable language for Hadooop Map-Reduce programs?

MapReduce is a programming model to perform ...READ MORE

answered Aug 4, 2018 in Big Data Hadoop by Neha
• 6,300 points
3,275 views
+1 vote
1 answer

What is the technique to know the Default scheduler in hadoop?

Default scheduler in hadoop is JobQueueTaskScheduler, which is ...READ MORE

answered Oct 31, 2018 in Big Data Hadoop by Frankie
• 9,830 points
1,412 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,598 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,693 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,283 views
0 votes
1 answer
0 votes
1 answer

What is Modeling data in Hadoop and how to do it?

I suggest spending some time with Apache ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by Frankie
• 9,830 points
1,568 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP