Mapreduce in Python

–1 vote
Hey. I am learning hadoop and I am going through the concepts of mapreduce. So far, I have understood the concepts of mapreduce and I have also run the mapreduce code in Java. But I am actually interested in Python scripting. But I dont know how to do mapreduce task in python. Can someone share a sample code?
Dec 20, 2018 in Big Data Hadoop by digger
• 26,680 points
187 views

1 answer to this question.

0 votes

mapper.py

#!/usr/bin/python
import sys
#Word Count Example
# input comes from standard input STDIN
for line in sys.stdin:
line = line.strip() #remove leading and trailing whitespaces
words = line.split() #split the line into words and returns as a list
for word in words:
#write the results to standard output STDOUT
print'%s\t%s' % (word,1) #Emit the word


reducer.py

#!/usr/bin/python
import sys
from operator import itemgetter
# using a dictionary to map words to their counts
current_word = None
current_count = 0
word = None
# input comes from STDIN
for line in sys.stdin:
line = line.strip()
word,count = line.split('\t',1)
try:
count = int(count)
except ValueError:
continue
if current_word == word:
current_count += count
else:
if current_word:
print '%s\t%s' % (current_word, current_count)
current_count = count
current_word = word
if current_word == word:
print '%s\t%s' % (current_word,current_count)
answered Dec 20, 2018 by Omkar
• 69,030 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Error running hadoop mapreduce in Python using Hadoop Streaming

Hi As you write mapper and reducer program  ...READ MORE

answered Jan 21 in Big Data Hadoop by anonymous
568 views
0 votes
1 answer

How to include third party library in Python MapReduce?

Problem has been solved by zipimport. Then I zip chardet to ...READ MORE

answered Nov 27, 2018 in Big Data Hadoop by Frankie
• 9,810 points
102 views
0 votes
1 answer

How to use custom FileInputFormat in MapReduce?

You have to override isSplitable method. ...READ MORE

answered Apr 10, 2018 in Big Data Hadoop by Shubham
• 13,450 points
310 views
0 votes
1 answer
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,950 points
6,090 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,950 points
928 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyF ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
39,107 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,320 points
2,282 views
0 votes
1 answer

In Hadoop MapReduce, how can i set an Object as the Value for Map output?

Try this and see if it works: public ...READ MORE

answered Nov 20, 2018 in Big Data Hadoop by Omkar
• 69,030 points
128 views
+1 vote
1 answer

How to write file in hdfs using python?

#!/usr/bin/python from subprocess import Popen, PIPE cat = Popen(["hadoop", ...READ MORE

answered Dec 6, 2018 in Big Data Hadoop by Omkar
• 69,030 points
4,698 views