Python read file as stream from HDFS

0 votes
I have a python read file in HDFS which I need to run, but due to some reasons or issues in my HDFS, I am unable to run it. I am getting a warning which says not enough to fit all in memory.

I wish to clear all cache and process this read file. I am searching for a method to do it without using any of the additional libraries. I was thinking to do this using the standard "Hadoop" command line tools using the Python subprocess module, but I can't seem to be able to do what I need since there are no command line tools that would do my processing and I would like to execute a Python function for every line in a streaming fashion.

Is there a way to apply Python functions as right operands of the pipes using the subprocess module? Or even better, open it like a file as a generator so I could process each line easily?
May 30 in Big Data Hadoop by nitinrawat895
• 10,690 points
72 views

1 answer to this question.

0 votes

I could redirect to a Python library which should help you fix the issue If it does not work, then, I can suggest you get the stdout pipe from your Popen object. 

#cat = subprocess.Popen(["hadoop", "fs", "-cat", "/path/to/myfile"], stdout=subprocess.PIPE)
#for line in cat.stdout:
#print line
answered May 30 by ravikiran
• 4,560 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Not Able to read the file from hdfs location

You have to mention the hdfs path, ...READ MORE

answered Jul 23 in Big Data Hadoop by Esha
29 views
0 votes
1 answer
0 votes
1 answer

Copy file from HDFS to the local file system

There are two possible ways to copy ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by nitinrawat895
• 10,690 points
5,804 views
0 votes
1 answer

Error while copying the file from local to HDFS

Well, the reason you are getting such ...READ MORE

answered May 2, 2018 in Big Data Hadoop by Ashish
• 2,630 points
517 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
3,099 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
349 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
15,212 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,128 views
0 votes
1 answer

Issue with Python Read file as stream from HDFS.

The easiest way is using the following ...READ MORE

answered Jun 26 in Big Data Hadoop by ravikiran
• 4,560 points
54 views
0 votes
1 answer