Python read file as stream from HDFS

0 votes
I have a python read file in HDFS which I need to run, but due to some reasons or issues in my HDFS, I am unable to run it. I am getting a warning which says not enough to fit all in memory.

I wish to clear all cache and process this read file. I am searching for a method to do it without using any of the additional libraries. I was thinking to do this using the standard "Hadoop" command line tools using the Python subprocess module, but I can't seem to be able to do what I need since there are no command line tools that would do my processing and I would like to execute a Python function for every line in a streaming fashion.

Is there a way to apply Python functions as right operands of the pipes using the subprocess module? Or even better, open it like a file as a generator so I could process each line easily?
May 30, 2019 in Big Data Hadoop by nitinrawat895
• 10,840 points
130 views

1 answer to this question.

0 votes

I could redirect to a Python library which should help you fix the issue If it does not work, then, I can suggest you get the stdout pipe from your Popen object. 

#cat = subprocess.Popen(["hadoop", "fs", "-cat", "/path/to/myfile"], stdout=subprocess.PIPE)
#for line in cat.stdout:
#print line
answered May 30, 2019 by ravikiran
• 4,600 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Not Able to read the file from hdfs location

You have to mention the hdfs path, ...READ MORE

answered Jul 23, 2019 in Big Data Hadoop by Esha
56 views
0 votes
1 answer
0 votes
1 answer

Copy file from HDFS to the local file system

There are two possible ways to copy ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by nitinrawat895
• 10,840 points
7,419 views
0 votes
1 answer

Error while copying the file from local to HDFS

Well, the reason you are getting such ...READ MORE

answered May 2, 2018 in Big Data Hadoop by Ashish
• 2,630 points
704 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
3,911 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
538 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
20,799 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,290 points
1,474 views
0 votes
1 answer

Issue with Python Read file as stream from HDFS.

The easiest way is using the following ...READ MORE

answered Jun 26, 2019 in Big Data Hadoop by ravikiran
• 4,600 points
285 views
0 votes
1 answer