Can we integrate Hadoop with Python?

0 votes

I have my project requirement in which one python script is used for analyzing the data. Previously, I used the txt files as an input to that python script. But as data grows, I have to switch my storage platform to Hadoop HDFS. How can I HDFS data to my python script? Is there any way of doing that?

Aug 23, 2018 in Big Data Hadoop by Neha
• 6,280 points
61 views

1 answer to this question.

0 votes

Use Hadoop streaming for using python,php etc Ex: hadoop jar hadoop/tools/lib/hadoop-streaming-2.7.2.jar -mapper /mapper.php -reducer /reducer.php -input /hdfsinputpath -output /hdfsoutputpath 

Hadoop Streaming API:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc

All you need to know about that is here: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

In addition to other approaches, you can also embed Pig Latin statements and Pig commands in Python script using a JDBC-like compile, bind, run model. For Python, make sure the Jython jar is included in your class path. Refer apache pig documentation here for more details: https://pig.apache.org/docs/r0.9.1/cont.html#embed-python

I hope this helps :)

answered Aug 23, 2018 by Frankie
• 9,810 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to integrate Hadoop with Elasticsearch?

The HDFS snapshot/restore plugin comes in three ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 10,110 points
142 views
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7 in Big Data Hadoop by pradeep
121 views
0 votes
1 answer

How can we send data from MongoDB to Hadoop?

The MongoDB Connector for Hadoop reads data ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 10,110 points
61 views
0 votes
1 answer

Can we use apache Mahout without Hadoop dependency?

There is a number of algorithm implementations ...READ MORE

answered Apr 26, 2018 in Big Data Hadoop by nitinrawat895
• 10,110 points
48 views
0 votes
1 answer

What does hadoop fs -du command gives as output?

The first value is the size of ...READ MORE

answered Apr 27, 2018 in Big Data Hadoop by Shubham
• 13,190 points
643 views
0 votes
1 answer

How can I write text in HDFS using CMD?

Hadoop put & appendToFile only reads standard ...READ MORE

answered Apr 27, 2018 in Big Data Hadoop by Shubham
• 13,190 points
63 views
0 votes
1 answer

What is the command to find the free space in HDFS?

You can use dfsadmin which runs a ...READ MORE

answered Apr 29, 2018 in Big Data Hadoop by Shubham
• 13,190 points
121 views
0 votes
1 answer

How to find the used cache in HDFS

hdfs dfsadmin -report This command tells fs ...READ MORE

answered May 4, 2018 in Big Data Hadoop by Shubham
• 13,190 points
163 views
0 votes
1 answer

Can you build “Spark” with any particular Hadoop version?

Yes, one can build “Spark” for a specific ...READ MORE

answered Dec 14, 2018 in Big Data Hadoop by Frankie
• 9,810 points
16 views