How to count meta data

0 votes
Let's suppose on daily basis, we are getting 5 TB data in our cluster. We have active-passive node with journal nodes architecture configured. So how we can calculate how much metadata we need to manage?

OR

If we are getting 5 GB data on daily basis then how much metadata will be generated in our cluster ??

Please guide me.
Feb 7, 2019 in Big Data Hadoop by Yamuna

edited Feb 7, 2019 by Omkar 1,793 views

1 answer to this question.

0 votes

In Hadoop, Namenode consumes about 150 bytes for block metadata storage and 150 bytes for file metadata storage. If your cluster block size is 128Mb and each of your 100 files is around 100Mb size. Each file will consume 300 bytes of memory in namenode. So, in total, Name node will be consuming 300*100=30000bytes of data. This is considering the replication is 1x.

In your case 5GB = 5120 MB data comes everyday i.e 40 blocks if we consider your block size as 128 MB (5120/128 = 40) and considering the replication factor as 3 (default) the total number of blocks will be 40 * 3 = 120 blocks.

So the total metadata consumed should be 150 Bytes * 120 = 18000 Bytes. This will vary depending on your replication factor and block size of your cluster.

answered Feb 7, 2019 by Omkar
• 69,220 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How can we send data from MongoDB to Hadoop?

The MongoDB Connector for Hadoop reads data ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
2,119 views
0 votes
1 answer

How to learn Big Data and Ecosystem ?

First understand Big Data and challenges associated ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
1,008 views
0 votes
1 answer

How to move data from Oracle database to Hadoop?

Yes, you heard it correctly. Apache Sqoop is ...READ MORE

answered Apr 12, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
7,332 views
0 votes
1 answer

How to groupBy/count then filter on count in Scala

I think the exception is caused because ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
29,765 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,072 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,571 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,060 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,639 views
+1 vote
1 answer

How to count number of rows in alias in PIG?

COUNT is part of pig LOGS= LOAD 'log'; LOGS_GROUP= ...READ MORE

answered Oct 15, 2018 in Big Data Hadoop by Omkar
• 69,220 points
2,839 views
0 votes
1 answer

Hadoop Hive: How to insert data in Hive table?

First, copy data into HDFS. Then create ...READ MORE

answered Nov 12, 2018 in Big Data Hadoop by Omkar
• 69,220 points
9,829 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP