Hadoop hdfs: list all files in a directory and its subdirectories

0 votes
I have a folder in my hdfs which has subfolders and files in the them. I want to know how I cant list all of these. please help. Thanks
Oct 26, 2018 in Big Data Hadoop by slayer
• 29,170 points

recategorized Oct 26, 2018 by Omkar 1,897 views

5 answers to this question.

0 votes

You can use this:

 Configuration conf = getConf();
    Job job = Job.getInstance(conf);
    FileSystem fs = FileSystem.get(conf);

    //the second boolean parameter here sets the recursion to true
    RemoteIterator<LocatedFileStatus> fileStatusListIterator = fs.listFiles(
            new Path("path/to/lib"), true);
    while(fileStatusListIterator.hasNext()){
        LocatedFileStatus fileStatus = fileStatusListIterator.next();
        //do stuff with the file like ...
        job.addFileToClassPath(fileStatus.getPath());
    }
answered Oct 26, 2018 by Omkar
• 67,620 points
0 votes
private int calculateNumberOfReducers(String input) throws IOException {
    int numberOfReducers = 0;
    Path inputPath = new Path(input);
    FileSystem fs = inputPath.getFileSystem(getConf());
    FileStatus[] statuses = fs.globStatus(inputPath);
    for(FileStatus status: statuses) {
        if(status.isDirectory()) {
            numberOfReducers += getNumberOfInputFiles(status, fs);
        } else if(status.isFile()) {
            numberOfReducers ++;
        }
    }
    return numberOfReducers;
}

/**
 * Recursively determines number of input files in an HDFS directory
 *
 * @param status instance of FileStatus
 * @param fs instance of FileSystem
 * @return number of input files within particular HDFS directory
 * @throws IOException
 */
private int getNumberOfInputFiles(FileStatus status, FileSystem fs) throws IOException  {
    int inputFileCount = 0;
    if(status.isDirectory()) {
        FileStatus[] files = fs.listStatus(status.getPath());
        for(FileStatus file: files) {
            inputFileCount += getNumberOfInputFiles(file, fs);
        }
    } else {
        inputFileCount ++;
    }

    return inputFileCount;
}
answered Dec 4, 2018 by Jayant
0 votes

You can do it using queue:

private static List<String> listAllFilePath(Path hdfsFilePath, FileSystem fs)
throws FileNotFoundException, IOException {
  List<String> filePathList = new ArrayList<String>();
  Queue<Path> fileQueue = new LinkedList<Path>();
  fileQueue.add(hdfsFilePath);
  while (!fileQueue.isEmpty()) {
    Path filePath = fileQueue.remove();
    if (fs.isFile(filePath)) {
      filePathList.add(filePath.toString());
    } else {
      FileStatus[] fileStatus = fs.listStatus(filePath);
      for (FileStatus fileStat : fileStatus) {
        fileQueue.add(fileStat.getPath());
      }
    }
  }
  return filePathList;
}
answered Dec 4, 2018 by Ishwar
0 votes
hdfs dfs -ls -R /.../...
answered Jul 19 by Richard
0 votes

Hi,

You can try this command:

hadoop fs -ls -d
answered Aug 1 by Dinish

Related Questions In Big Data Hadoop

0 votes
5 answers
0 votes
1 answer
0 votes
1 answer

How can we list files in HDFS directory as per timestamp?

No, there is no other option to ...READ MORE

answered May 8, 2018 in Big Data Hadoop by nitinrawat895
• 10,670 points
912 views
+1 vote
2 answers

What does hadoop fs -du command gives as output?

du command is used for to see ...READ MORE

answered Jul 23 in Big Data Hadoop by Lokesh Singh
968 views
0 votes
1 answer

How can I write text in HDFS using CMD?

Hadoop put & appendToFile only reads standard ...READ MORE

answered Apr 27, 2018 in Big Data Hadoop by Shubham
• 13,300 points
85 views
0 votes
1 answer

What is the command to find the free space in HDFS?

You can use dfsadmin which runs a ...READ MORE

answered Apr 29, 2018 in Big Data Hadoop by Shubham
• 13,300 points
205 views
0 votes
1 answer

How to find the used cache in HDFS

hdfs dfsadmin -report This command tells fs ...READ MORE

answered May 4, 2018 in Big Data Hadoop by Shubham
• 13,300 points
238 views
0 votes
1 answer

How to list files in hdfs that contains a specific string?

Yes, you can do this. You can ...READ MORE

answered Jan 27 in Big Data Hadoop by Omkar
• 67,620 points
697 views
0 votes
3 answers

Spark Scala: How to list all folders in directory

val spark = SparkSession.builder().appName("Demo").getOrCreate() val path = new ...READ MORE

answered Dec 4, 2018 in Big Data Hadoop by Mark
2,091 views