Mapreduce in Python

0 votes
Hey. I am learning hadoop and I am going through the concepts of mapreduce. So far, I have understood the concepts of mapreduce and I have also run the mapreduce code in Java. But I am actually interested in Python scripting. But I dont know how to do mapreduce task in python. Can someone share a sample code?
Dec 20, 2018 in Big Data Hadoop by digger
• 27,620 points
24 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

mapper.py

#!/usr/bin/python
import sys
#Word Count Example
# input comes from standard input STDIN
for line in sys.stdin:
line = line.strip() #remove leading and trailing whitespaces
words = line.split() #split the line into words and returns as a list
for word in words:
#write the results to standard output STDOUT
print'%s\t%s' % (word,1) #Emit the word


reducer.py

#!/usr/bin/python
import sys
from operator import itemgetter
# using a dictionary to map words to their counts
current_word = None
current_count = 0
word = None
# input comes from STDIN
for line in sys.stdin:
line = line.strip()
word,count = line.split('\t',1)
try:
count = int(count)
except ValueError:
continue
if current_word == word:
current_count += count
else:
if current_word:
print '%s\t%s' % (current_word, current_count)
current_count = count
current_word = word
if current_word == word:
print '%s\t%s' % (current_word,current_count)
answered Dec 20, 2018 by Omkar
• 65,820 points

Related Questions In Big Data Hadoop

0 votes
0 answers

Error running hadoop mapreduce in Python using Hadoop Streaming

I was trying a sample mapredyce code ...READ MORE

Apr 2, 2018 in Big Data Hadoop by nitinrawat895
• 9,030 points
52 views
0 votes
1 answer

How to include third party library in Python MapReduce?

Problem has been solved by zipimport. Then I zip chardet to ...READ MORE

answered Nov 27, 2018 in Big Data Hadoop by Frankie
• 9,570 points
33 views
0 votes
1 answer

How to use custom FileInputFormat in MapReduce?

You have to override isSplitable method. ...READ MORE

answered Apr 10, 2018 in Big Data Hadoop by Shubham
• 12,110 points
77 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
1,639 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
130 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
7,953 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
551 views
0 votes
1 answer

In Hadoop MapReduce, how can i set an Object as the Value for Map output?

Try this and see if it works: public ...READ MORE

answered Nov 20, 2018 in Big Data Hadoop by Omkar
• 65,820 points
17 views
+1 vote
1 answer

How to write file in hdfs using python?

#!/usr/bin/python from subprocess import Popen, PIPE cat = Popen(["hadoop", ...READ MORE

answered Dec 6, 2018 in Big Data Hadoop by Omkar
• 65,820 points
702 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.