How to count meta data?

0 votes
Let's suppose on daily basis, we are getting 5 TB data in our cluster. We have active-passive node with journal nodes architecture configured. So how we can calculate how much metadata we need to manage?

OR

If we are getting 5 GB data on daily basis then how much metadata will be generated in our cluster ??

Please guide me.
Feb 7 in Big Data Hadoop by Yamuna

edited Feb 7 by Omkar 40 views

1 answer to this question.

0 votes

In Hadoop, Namenode consumes about 150 bytes for block metadata storage and 150 bytes for file metadata storage. If your cluster block size is 128Mb and each of your 100 files is around 100Mb size. Each file will consume 300 bytes of memory in namenode. So, in total, Name node will be consuming 300*100=30000bytes of data. This is considering the replication is 1x.

In your case 5GB = 5120 MB data comes everyday i.e 40 blocks if we consider your block size as 128 MB (5120/128 = 40) and considering the replication factor as 3 (default) the total number of blocks will be 40 * 3 = 120 blocks.

So the total metadata consumed should be 150 Bytes * 120 = 18000 Bytes. This will vary depending on your replication factor and block size of your cluster.

answered Feb 7 by Omkar
• 67,660 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How can we send data from MongoDB to Hadoop?

The MongoDB Connector for Hadoop reads data ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 10,710 points
154 views
0 votes
1 answer

How to learn Big Data and Ecosystem ?

First understand Big Data and challenges associated ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
40 views
0 votes
1 answer

How to move data from Oracle database to Hadoop?

Yes, you heard it correctly. Apache Sqoop is ...READ MORE

answered Apr 11, 2018 in Big Data Hadoop by nitinrawat895
• 10,710 points
2,202 views
0 votes
1 answer

How to groupBy/count then filter on count in Scala

I think the exception is caused because ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
8,713 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
3,324 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
398 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,424 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,197 views
+1 vote
1 answer

How to count number of rows in alias in PIG?

COUNT is part of pig LOGS= LOAD 'log'; LOGS_GROUP= ...READ MORE

answered Oct 15, 2018 in Big Data Hadoop by Omkar
• 67,660 points
148 views
0 votes
1 answer

Hadoop Hive: How to insert data in Hive table?

First, copy data into HDFS. Then create ...READ MORE

answered Nov 12, 2018 in Big Data Hadoop by Omkar
• 67,660 points
1,452 views