Hadoop Spark How to iterate hdfs directories

How to iterate hdfs directories using spark?

Oct 29, 2018 in Big Data Hadoop by digger
• 26,740 points
recategorized Oct 29, 2018 by Omkar • 12,303 views

3 answers to this question.

You can use org.apache.hadoop.fs.FileSystem.

Using SPARK:

FileSystem.get(sc.hadoopConfiguration()).listFiles(..., true)

answered Oct 29, 2018 by Omkar
• 69,180 points

Using PySpark

hadoop = sc._jvm.org.apache.hadoop

fs = hadoop.fs.FileSystem
conf = hadoop.conf.Configuration() 
path = hadoop.fs.Path('/hivewarehouse/disc_mrt.db/unified_fact/')

for f in fs.get(conf).listStatus(path):
    print f.getPath()

answered Dec 5, 2018 by Kiran

import  org.apache.hadoop.fs.{FileSystem,Path}

FileSystem.get( sc.hadoopConfiguration ).listStatus( new Path("hdfs:///tmp")).foreach( x => println(x.getPath ))

answered Dec 5, 2018 by Komal

Related Questions In Big Data Hadoop

0 votes

1 answer

How to access different directories in a Hadoop cluster?

You need to configure the client to ...READ MORE

answered Sep 18, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 1,574 views

0 votes

1 answer

How to execute python script in hadoop file system (hdfs)?

If you are simply looking to distribute ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by digger
• 26,740 points • 14,689 views

0 votes

1 answer

Hadoop: How to copy directory from local system to hdfs using Java code?

Just use the FileSystem's copyFromLocalFile method. If the source Path ...READ MORE

answered Nov 14, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 6,730 views

0 votes

1 answer

Hadoop HDFS: How to delete old files from HDFS?

You can use commands like this: hdfs dfs ...READ MORE

answered Nov 15, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 15,532 views

+1 vote

2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

answered Aug 7, 2019 in Apache Spark by ashish
• 7,240 views

0 votes

1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
• 3,190 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 14,043 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 4,847 views

0 votes

1 answer

How to check the size of a file in Hadoop HDFS?

You can use the hadoop fs -ls command to ...READ MORE

answered Nov 21, 2018 in Big Data Hadoop by Omkar
• 69,180 points • 15,066 views

0 votes

1 answer

How to move Word and PDF documents to Hadoop HDFS?

Try with below commands: hadoop fs -copyFromLocal <localsrc> ...READ MORE

answered Dec 5, 2018 in Big Data Hadoop by Frankie
• 9,830 points • 1,865 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP