Error running hadoop mapreduce in Python using Hadoop Streaming

0 votes

I was trying a sample mapredyce code written in python using hadoop streaming in cloudera quickstart VM. But, I am stuck in between.

Here is my mapper code:

#!/usr/bin/env python

import re
import sys

for line in sys.stdin:
  val = line.strip()
  (year, temp, q) = (val[15:19], val[87:92], val[92:93])
  if (temp != "+9999" and re.match("[01459]", q)):
print "%s\t%s" % (year, temp)

Here is my reducer code:

#!/usr/bin/env python

import sys

(last_key, max_val) = (None, -sys.maxint)
for line in sys.stdin:
  (key, val) = line.strip().split("\t")
  if last_key and last_key != key:
    print "%s\t%s" % (last_key, max_val)
    (last_key, max_val) = (key, int(val))
  else:
    (last_key, max_val) = (key, max(max_val, int(val)))

if last_key:
print "%s\t%s" % (last_key, max_val)

source: https://github.com/tomwhite/hadoop-book/tree/master/ch02-mr-intro/src/main/python

This is the command that I am executing in order to run the mapreduce job:


hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar \
-input /user/cloudera/sample.txt \
-output /user/cloudera/output
-mapper /home/cloudera/streaming-sample/max_temperature_map.py \
-reducer /home/cloudera/streaming-sample/max_temperature_reduce.py

This is the error log snippet that I am getting:

image

Please help me understanding what I am doing wrong here.

Apr 2, 2018 in Big Data Hadoop by nitinrawat895
• 10,840 points
257 views

1 answer to this question.

0 votes

Hi

As you write mapper and reducer program  by yourself, so you have to mention "-file" keyword and file name before -mapper and -reducer. 

You can run the below command.

$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar \
-input /user/cloudera/sample.txt \
-output /user/cloudera/output
-file max_temperature_map.py  -mapper /home/cloudera/streaming-sample/max_temperature_map.py \
-file max_temperature_reduce.py  -reducer /home/cloudera/streaming-sample/max_temperature_reduce.py

Just make changes the Bold part shown above and run your command again.

Hope this will solve your problem.

Thank You 

answered Jan 21 by anonymous

Related Questions In Big Data Hadoop

0 votes
1 answer

Getting error in Hadoop Streaming: Type mismatch in Key from Map

In Hadoop streaming you have to customize ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by coldcode
• 2,050 points
318 views
0 votes
1 answer

Error in Hadoop Mapreduce

The file that you are referring here ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by Shubham
• 13,370 points
141 views
0 votes
1 answer
+1 vote
2 answers

How to authenticate username & password while using Connector for Cloudera Hadoop in Tableau?

Hadoop server installed was kerberos enabled server. ...READ MORE

answered Aug 21, 2018 in Big Data Hadoop by Priyaj
• 57,530 points
289 views
0 votes
1 answer
0 votes
1 answer

Getting error in MapReduce job.setInputFormatClass

In old Hadoop API(i.e. below Hadoop 0.20.0), ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 13,370 points
490 views
0 votes
1 answer

Moving files in Hadoop using the Java API?

I would recommend you to use FileSystem.rename(). ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 13,370 points
1,076 views
0 votes
1 answer

How to execute python script in hadoop file system (hdfs)?

If you are simply looking to distribute ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by digger
• 26,660 points
3,362 views
+2 votes
4 answers

Datanode process not running in Hadoop

Run the following commands: Stop-all.sh start-dfs.sh start-yarn.sh mr-jobhistory-daemon.sh ...READ MORE

answered Oct 25, 2018 in Big Data Hadoop by Anand
1,043 views