Error running hadoop mapreduce in Python using Hadoop Streaming

0 votes

I was trying a sample mapredyce code written in python using hadoop streaming in cloudera quickstart VM. But, I am stuck in between.

Here is my mapper code:

#!/usr/bin/env python

import re
import sys

for line in sys.stdin:
  val = line.strip()
  (year, temp, q) = (val[15:19], val[87:92], val[92:93])
  if (temp != "+9999" and re.match("[01459]", q)):
print "%s\t%s" % (year, temp)

Here is my reducer code:

#!/usr/bin/env python

import sys

(last_key, max_val) = (None, -sys.maxint)
for line in sys.stdin:
  (key, val) = line.strip().split("\t")
  if last_key and last_key != key:
    print "%s\t%s" % (last_key, max_val)
    (last_key, max_val) = (key, int(val))
    (last_key, max_val) = (key, max(max_val, int(val)))

if last_key:
print "%s\t%s" % (last_key, max_val)


This is the command that I am executing in order to run the mapreduce job:

hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar \
-input /user/cloudera/sample.txt \
-output /user/cloudera/output
-mapper /home/cloudera/streaming-sample/ \
-reducer /home/cloudera/streaming-sample/

This is the error log snippet that I am getting:


Please help me understanding what I am doing wrong here.

Apr 2, 2018 in Big Data Hadoop by nitinrawat895
• 10,950 points

1 answer to this question.

0 votes


As you write mapper and reducer program  by yourself, so you have to mention "-file" keyword and file name before -mapper and -reducer. 

You can run the below command.

$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar \
-input /user/cloudera/sample.txt \
-output /user/cloudera/output
-file  -mapper /home/cloudera/streaming-sample/ \
-file  -reducer /home/cloudera/streaming-sample/

Just make changes the Bold part shown above and run your command again.

Hope this will solve your problem.

Thank You 

answered Jan 21 by anonymous

Related Questions In Big Data Hadoop

0 votes
1 answer

Getting error in Hadoop Streaming: Type mismatch in Key from Map

In Hadoop streaming you have to customize ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by coldcode
• 2,050 points
0 votes
1 answer

Error in Hadoop Mapreduce

The file that you are referring here ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by Shubham
• 13,450 points
0 votes
0 answers
0 votes
1 answer
+1 vote
2 answers

How to authenticate username & password while using Connector for Cloudera Hadoop in Tableau?

Hadoop server installed was kerberos enabled server. ...READ MORE

answered Aug 21, 2018 in Big Data Hadoop by Priyaj
• 57,660 points
0 votes
1 answer
0 votes
1 answer

Getting error in MapReduce job.setInputFormatClass

In old Hadoop API(i.e. below Hadoop 0.20.0), ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 13,450 points
0 votes
1 answer

How to execute python script in hadoop file system (hdfs)?

If you are simply looking to distribute ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by digger
• 26,670 points
+2 votes
4 answers

Datanode process not running in Hadoop

Run the following commands: ...READ MORE

answered Oct 25, 2018 in Big Data Hadoop by Anand