How many FSimage files will be created in hard disk?

–1 vote

When RAM contains Metadata information (say 5 GB), how many FSimage files will be created in hard disk?

Dec 20, 2018 in Big Data Hadoop by digger
• 26,550 points
105 views

1 answer to this question.

0 votes

In Hdfs, data and metadata are decoupled. Data files are split into block files that are stored, and replicated, on DataNodes across the cluster. The filesystem namespace tree and associated metadata are stores on the Namenode.

Let’s say we have a file called “file.txt” that is 1GB (1000MB) and our block size is 128MB. We will end up with 7 128MB blocks and a 104MB block. The NameNode keeps track of the fact that “file.txt” in HDFS maps to these eight blocks and three replicas of each block. DataNodes store blocks, not files, so the mapping is important to understanding where our data is and what our data is.

Corresponding to a block 150 bytes (roughly) of metadata is created, Since there are 8 blocks with replication factor 3 i.e. 24 blocks. Hence 150x24 = 3600 bytes of metadata will be created.

On disk, the NameNode stores the metadata for the file system. This includes file and directory permissions, ownerships, and assigned blocks in the fsimage and the edit logs. In properly configured setups, it also includes a list of DataNodes that make up the HDFS (dfs.include parameter) and DataNodes that are to be removed from that list (dfs.exclude parameter). Note that which DataNodes have which blocks is only stored in memory and not on disk.

Block size by default is 128 MB so you can do the calculation pertaining to how much RAM will support how many files. To guarantee persistence of the filesystem metadata the NN has to keep a copy of its memory structures on disk also the NN dirs and they will hold the fsimage and editlogs. Editlogs captures all changes that are happening to HDFS (such as new files and directories), think redo logs that most RDBM's use. The fsimage is a full snapshot of the metadata state. The fsimage file will not grow beyond the allocated NN memory set and the edit logs will get rotated once it hits a specific size.

answered Dec 20, 2018 by Omkar
• 67,660 points

Related Questions In Big Data Hadoop

0 votes
1 answer
0 votes
1 answer

When hadoop-env.sh will be executed in hadoop

Yes you need to put in the ...READ MORE

answered Apr 3, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
112 views
0 votes
1 answer

How can we list files in HDFS directory as per timestamp?

No, there is no other option to ...READ MORE

answered May 8, 2018 in Big Data Hadoop by nitinrawat895
• 10,710 points
1,060 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
3,339 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
399 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,539 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,208 views
0 votes
1 answer

If there are two joins in hive, how many mapreduce jobs will run?

There are two conditions for no. of ...READ MORE

answered Dec 19, 2018 in Big Data Hadoop by Omkar
• 67,660 points
171 views
0 votes
1 answer

How to read HDFS and local files with the same code in Java?

You can try something like this: ​ ...READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Omkar
• 67,660 points
744 views