How to execute python script in hadoop file system hdfs

0 votes

By default, hadoop allows us to run java codes. But now i want to run this python script:

import os.path

def transform():
    inputfolder = "input"
    for filename in os.listdir(inputfolder):
        path = inputfolder + "\\" + filename
        os.remove(path)
def main():
    transform()
if __name__ == "__main__":  main()

How to run .py file instead of .jar file?

Sep 19, 2018 in Big Data Hadoop by slayer
• 29,370 points

recategorized Sep 19, 2018 by slayer 13,510 views

1 answer to this question.

0 votes

If you are simply looking to distribute your python script across the cluster then you want to use Hadoop Streaming.

The basic syntax of the command looks like (from https://hadoop.apache.org/docs/r1.2.1/streaming.html):

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper myPythonScript.py \
-file myPythonScript.py

This basically creates a map-reduce job for your python script

Hope this helps :-)

answered Sep 19, 2018 by digger
• 26,740 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to check the size of a file in Hadoop HDFS?

You can use the  hadoop fs -ls command to ...READ MORE

answered Nov 21, 2018 in Big Data Hadoop by Omkar
• 69,220 points
13,482 views
+1 vote
1 answer

How to write file in hdfs using python?

#!/usr/bin/python from subprocess import Popen, PIPE cat = Popen(["hadoop", ...READ MORE

answered Dec 6, 2018 in Big Data Hadoop by Omkar
• 69,220 points
8,412 views
0 votes
1 answer

How to unzip a zipped file stored in Hadoop hdfs?

hadoop fs -text /hdfs-path-to-zipped-file.gz | hadoop fs ...READ MORE

answered Dec 12, 2018 in Big Data Hadoop by Omkar
• 69,220 points
12,624 views
0 votes
1 answer

How to view contents of file in hadoop hdfs?

Hi@akhtar, You can use the Hadoop filesystem command ...READ MORE

answered Oct 5, 2020 in Big Data Hadoop by MD
• 95,460 points
8,176 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,983 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,501 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,557 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,587 views
0 votes
1 answer

How to write a file in hdfs with Java?

You could pass the URI when getting ...READ MORE

answered Sep 26, 2018 in Big Data Hadoop by digger
• 26,740 points
4,221 views
0 votes
1 answer

How to create smaller table from big table in HIVE?

You could probably best use Hive's built-in sampling ...READ MORE

answered Sep 24, 2018 in Big Data Hadoop by digger
• 26,740 points
1,717 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP