How to count meta data

Question

Let's suppose on daily basis, we are getting 5 TB data in our cluster. We have active-passive node with journal nodes architecture configured. So how we can calculate how much metadata we need to manage?

OR

If we are getting 5 GB data on daily basis then how much metadata will be generated in our cluster ??

Please guide me.

Omkar · Answer 1 · Feb 7, 2019

In Hadoop, Namenode consumes about 150 bytes for block metadata storage and 150 bytes for file metadata storage. If your cluster block size is 128Mb and each of your 100 files is around 100Mb size. Each file will consume 300 bytes of memory in namenode. So, in total, Name node will be consuming 300*100=30000bytes of data. This is considering the replication is 1x.

In your case 5GB = 5120 MB data comes everyday i.e 40 blocks if we consider your block size as 128 MB (5120/128 = 40) and considering the replication factor as 3 (default) the total number of blocks will be 40 * 3 = 120 blocks.

So the total metadata consumed should be 150 Bytes * 120 = 18000 Bytes. This will vary depending on your replication factor and block size of your cluster.