Python read file as stream from HDFS

0 votes
I have a python read file in HDFS which I need to run, but due to some reasons or issues in my HDFS, I am unable to run it. I am getting a warning which says not enough to fit all in memory.

I wish to clear all cache and process this read file. I am searching for a method to do it without using any of the additional libraries. I was thinking to do this using the standard "Hadoop" command line tools using the Python subprocess module, but I can't seem to be able to do what I need since there are no command line tools that would do my processing and I would like to execute a Python function for every line in a streaming fashion.

Is there a way to apply Python functions as right operands of the pipes using the subprocess module? Or even better, open it like a file as a generator so I could process each line easily?
May 30, 2019 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,565 views

1 answer to this question.

0 votes

I could redirect to a Python library which should help you fix the issue If it does not work, then, I can suggest you get the stdout pipe from your Popen object. 

#cat = subprocess.Popen(["hadoop", "fs", "-cat", "/path/to/myfile"], stdout=subprocess.PIPE)
#for line in cat.stdout:
#print line
answered May 30, 2019 by ravikiran
• 4,620 points

Related Questions In Big Data Hadoop

0 votes
2 answers

Not Able to read the file from hdfs location

Please make sure you connect to spark2-shell ...READ MORE

answered Jul 14, 2020 in Big Data Hadoop by Shantanu
• 190 points
1,749 views
0 votes
1 answer
0 votes
1 answer

Copy file from HDFS to the local file system

There are two possible ways to copy ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
16,413 views
0 votes
1 answer

Error while copying the file from local to HDFS

Well, the reason you are getting such ...READ MORE

answered May 3, 2018 in Big Data Hadoop by Ashish
• 2,650 points
3,673 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,555 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,184 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,199 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,260 views
0 votes
1 answer

Issue with Python Read file as stream from HDFS.

The easiest way is using the following ...READ MORE

answered Jun 26, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
2,830 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP