Can we integrate Hadoop with Python

0 votes

I have my project requirement in which one python script is used for analyzing the data. Previously, I used the txt files as an input to that python script. But as data grows, I have to switch my storage platform to Hadoop HDFS. How can I HDFS data to my python script? Is there any way of doing that?

Aug 23, 2018 in Big Data Hadoop by Neha
• 6,300 points
1,343 views

1 answer to this question.

0 votes

Use Hadoop streaming for using python,php etc Ex: hadoop jar hadoop/tools/lib/hadoop-streaming-2.7.2.jar -mapper /mapper.php -reducer /reducer.php -input /hdfsinputpath -output /hdfsoutputpath 

Hadoop Streaming API:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc

All you need to know about that is here: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

In addition to other approaches, you can also embed Pig Latin statements and Pig commands in Python script using a JDBC-like compile, bind, run model. For Python, make sure the Jython jar is included in your class path. Refer apache pig documentation here for more details: https://pig.apache.org/docs/r0.9.1/cont.html#embed-python

I hope this helps :)

answered Aug 23, 2018 by Frankie
• 9,830 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to integrate Hadoop with Elasticsearch?

The HDFS snapshot/restore plugin comes in three ...READ MORE

answered Mar 22, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,200 views
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7, 2019 in Big Data Hadoop by pradeep
1,821 views
0 votes
1 answer

How can we send data from MongoDB to Hadoop?

The MongoDB Connector for Hadoop reads data ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,875 views
0 votes
1 answer

Can we use apache Mahout without Hadoop dependency?

There is a number of algorithm implementations ...READ MORE

answered Apr 26, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
758 views
+1 vote
2 answers

What does hadoop fs -du command gives as output?

du command is used for to see ...READ MORE

answered Jul 24, 2019 in Big Data Hadoop by Lokesh Singh
5,441 views
0 votes
1 answer

How can I write text in HDFS using CMD?

Hadoop put & appendToFile only reads standard ...READ MORE

answered Apr 27, 2018 in Big Data Hadoop by Shubham
• 13,490 points
1,731 views
0 votes
1 answer

What is the command to find the free space in HDFS?

You can use dfsadmin which runs a ...READ MORE

answered Apr 29, 2018 in Big Data Hadoop by Shubham
• 13,490 points
1,847 views
0 votes
1 answer

How to find the used cache in HDFS

hdfs dfsadmin -report This command tells fs ...READ MORE

answered May 4, 2018 in Big Data Hadoop by Shubham
• 13,490 points
2,025 views
0 votes
1 answer

Can you build “Spark” with any particular Hadoop version?

Yes, one can build “Spark” for a specific ...READ MORE

answered Dec 14, 2018 in Big Data Hadoop by Frankie
• 9,830 points
1,049 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP