Python read file as stream from HDFS

0 votes
I have a python read file in HDFS which I need to run, but due to some reasons or issues in my HDFS, I am unable to run it. I am getting a warning which says not enough to fit all in memory.

I wish to clear all cache and process this read file. I am searching for a method to do it without using any of the additional libraries. I was thinking to do this using the standard "Hadoop" command line tools using the Python subprocess module, but I can't seem to be able to do what I need since there are no command line tools that would do my processing and I would like to execute a Python function for every line in a streaming fashion.

Is there a way to apply Python functions as right operands of the pipes using the subprocess module? Or even better, open it like a file as a generator so I could process each line easily?
May 30 in Big Data Hadoop by nitinrawat895
• 9,390 points
16 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

I could redirect to a Python library which should help you fix the issue If it does not work, then, I can suggest you get the stdout pipe from your Popen object. 

#cat = subprocess.Popen(["hadoop", "fs", "-cat", "/path/to/myfile"], stdout=subprocess.PIPE)
#for line in cat.stdout:
#print line
answered May 30 by ravikiran
• 2,200 points

Related Questions In Big Data Hadoop

0 votes
1 answer
0 votes
1 answer

Copy file from HDFS to the local file system

There are two possible ways to copy ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by nitinrawat895
• 9,390 points
3,038 views
0 votes
1 answer

Error while copying the file from local to HDFS

Well, the reason you are getting such ...READ MORE

answered May 2, 2018 in Big Data Hadoop by Ashish
• 2,630 points
282 views
0 votes
1 answer

How a client reads a file from HDFS?

Let me explain you it briefly. So, ...READ MORE

answered Jul 31, 2018 in Big Data Hadoop by nitinrawat895
• 9,390 points
125 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,390 points
1,830 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,390 points
157 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
9,059 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
650 views
0 votes
1 answer
0 votes
1 answer

I need to copy data from one HDFS to another HDFS. Can you help me do so?

I understood your issue. Let me help you ...READ MORE

answered May 16 in Big Data Hadoop by ravikiran
• 2,200 points
21 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.