Error running hadoop mapreduce in Python using Hadoop Streaming

0 votes

I was trying a sample mapredyce code written in python using hadoop streaming in cloudera quickstart VM. But, I am stuck in between.

Here is my mapper code:

#!/usr/bin/env python

import re
import sys

for line in sys.stdin:
  val = line.strip()
  (year, temp, q) = (val[15:19], val[87:92], val[92:93])
  if (temp != "+9999" and re.match("[01459]", q)):
print "%s\t%s" % (year, temp)

Here is my reducer code:

#!/usr/bin/env python

import sys

(last_key, max_val) = (None, -sys.maxint)
for line in sys.stdin:
  (key, val) = line.strip().split("\t")
  if last_key and last_key != key:
    print "%s\t%s" % (last_key, max_val)
    (last_key, max_val) = (key, int(val))
  else:
    (last_key, max_val) = (key, max(max_val, int(val)))

if last_key:
print "%s\t%s" % (last_key, max_val)

source: https://github.com/tomwhite/hadoop-book/tree/master/ch02-mr-intro/src/main/python

This is the command that I am executing in order to run the mapreduce job:


hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar \
-input /user/cloudera/sample.txt \
-output /user/cloudera/output
-mapper /home/cloudera/streaming-sample/max_temperature_map.py \
-reducer /home/cloudera/streaming-sample/max_temperature_reduce.py

This is the error log snippet that I am getting:

image

Please help me understanding what I am doing wrong here.

Apr 3, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
2,154 views

1 answer to this question.

0 votes

Hi

As you write mapper and reducer program  by yourself, so you have to mention "-file" keyword and file name before -mapper and -reducer. 

You can run the below command.

$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar \
-input /user/cloudera/sample.txt \
-output /user/cloudera/output
-file max_temperature_map.py  -mapper /home/cloudera/streaming-sample/max_temperature_map.py \
-file max_temperature_reduce.py  -reducer /home/cloudera/streaming-sample/max_temperature_reduce.py

Just make changes the Bold part shown above and run your command again.

Hope this will solve your problem.

Thank You 

answered Jan 21, 2020 by anonymous

Related Questions In Big Data Hadoop

0 votes
1 answer

Getting error in Hadoop Streaming: Type mismatch in Key from Map

In Hadoop streaming you have to customize ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by coldcode
• 2,080 points
999 views
0 votes
1 answer

Error in Hadoop Mapreduce

The file that you are referring here ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by Shubham
• 13,490 points
852 views
0 votes
0 answers

How I can kill the jobs using jobID running in local mode with Hadoop

I am Running hadoop jobs in local ...READ MORE

Aug 26, 2020 in Big Data Hadoop by kamboj
• 140 points
1,943 views
0 votes
1 answer
+1 vote
2 answers

How to authenticate username & password while using Connector for Cloudera Hadoop in Tableau?

Hadoop server installed was kerberos enabled server. ...READ MORE

answered Aug 21, 2018 in Big Data Hadoop by Priyaj
• 58,090 points
1,358 views
0 votes
1 answer
0 votes
1 answer

Getting error in MapReduce job.setInputFormatClass

In old Hadoop API(i.e. below Hadoop 0.20.0), ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 13,490 points
1,818 views
0 votes
1 answer

How to execute python script in hadoop file system (hdfs)?

If you are simply looking to distribute ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by digger
• 26,740 points
12,972 views
+2 votes
4 answers

Datanode process not running in Hadoop

Run the following commands: Stop-all.sh start-dfs.sh start-yarn.sh mr-jobhistory-daemon.sh ...READ MORE

answered Oct 25, 2018 in Big Data Hadoop by Anand
17,143 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP