Hadoop hdfs: list all files in a directory and its subdirectories

0 votes
I have a folder in my hdfs which has subfolders and files in the them. I want to know how I cant list all of these. please help. Thanks
Oct 26, 2018 in Big Data Hadoop by slayer
• 29,040 points

recategorized Oct 26, 2018 by Omkar 1,012 views

3 answers to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You can use this:

 Configuration conf = getConf();
    Job job = Job.getInstance(conf);
    FileSystem fs = FileSystem.get(conf);

    //the second boolean parameter here sets the recursion to true
    RemoteIterator<LocatedFileStatus> fileStatusListIterator = fs.listFiles(
            new Path("path/to/lib"), true);
    while(fileStatusListIterator.hasNext()){
        LocatedFileStatus fileStatus = fileStatusListIterator.next();
        //do stuff with the file like ...
        job.addFileToClassPath(fileStatus.getPath());
    }
answered Oct 26, 2018 by Omkar
• 66,910 points
0 votes
private int calculateNumberOfReducers(String input) throws IOException {
    int numberOfReducers = 0;
    Path inputPath = new Path(input);
    FileSystem fs = inputPath.getFileSystem(getConf());
    FileStatus[] statuses = fs.globStatus(inputPath);
    for(FileStatus status: statuses) {
        if(status.isDirectory()) {
            numberOfReducers += getNumberOfInputFiles(status, fs);
        } else if(status.isFile()) {
            numberOfReducers ++;
        }
    }
    return numberOfReducers;
}

/**
 * Recursively determines number of input files in an HDFS directory
 *
 * @param status instance of FileStatus
 * @param fs instance of FileSystem
 * @return number of input files within particular HDFS directory
 * @throws IOException
 */
private int getNumberOfInputFiles(FileStatus status, FileSystem fs) throws IOException  {
    int inputFileCount = 0;
    if(status.isDirectory()) {
        FileStatus[] files = fs.listStatus(status.getPath());
        for(FileStatus file: files) {
            inputFileCount += getNumberOfInputFiles(file, fs);
        }
    } else {
        inputFileCount ++;
    }

    return inputFileCount;
}
answered Dec 4, 2018 by Jayant
0 votes

You can do it using queue:

private static List<String> listAllFilePath(Path hdfsFilePath, FileSystem fs)
throws FileNotFoundException, IOException {
  List<String> filePathList = new ArrayList<String>();
  Queue<Path> fileQueue = new LinkedList<Path>();
  fileQueue.add(hdfsFilePath);
  while (!fileQueue.isEmpty()) {
    Path filePath = fileQueue.remove();
    if (fs.isFile(filePath)) {
      filePathList.add(filePath.toString());
    } else {
      FileStatus[] fileStatus = fs.listStatus(filePath);
      for (FileStatus fileStat : fileStatus) {
        fileQueue.add(fileStat.getPath());
      }
    }
  }
  return filePathList;
}
answered Dec 4, 2018 by Ishwar

Related Questions In Big Data Hadoop

0 votes
5 answers
0 votes
1 answer
0 votes
1 answer

How can we list files in HDFS directory as per timestamp?

No, there is no other option to ...READ MORE

answered May 8, 2018 in Big Data Hadoop by nitinrawat895
• 9,410 points
429 views
0 votes
1 answer

What does hadoop fs -du command gives as output?

The first value is the size of ...READ MORE

answered Apr 27, 2018 in Big Data Hadoop by Shubham
• 12,810 points
564 views
0 votes
1 answer

How can I write text in HDFS using CMD?

Hadoop put & appendToFile only reads standard ...READ MORE

answered Apr 27, 2018 in Big Data Hadoop by Shubham
• 12,810 points
48 views
0 votes
1 answer

What is the command to find the free space in HDFS?

You can use dfsadmin which runs a ...READ MORE

answered Apr 29, 2018 in Big Data Hadoop by Shubham
• 12,810 points
99 views
0 votes
1 answer

How to find the used cache in HDFS

hdfs dfsadmin -report This command tells fs ...READ MORE

answered May 4, 2018 in Big Data Hadoop by Shubham
• 12,810 points
138 views
0 votes
1 answer

How to list files in hdfs that contains a specific string?

Yes, you can do this. You can ...READ MORE

answered Jan 27 in Big Data Hadoop by Omkar
• 66,910 points
175 views
0 votes
3 answers

Spark Scala: How to list all folders in directory

val spark = SparkSession.builder().appName("Demo").getOrCreate() val path = new ...READ MORE

answered Dec 4, 2018 in Big Data Hadoop by Mark
877 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.