What is the use of sequence file in Hadoop?

0 votes
I read about sequence file format in few blogs. Since, I am still new to hadoop I am not actually able to understand what is the application or purpose of sequence files. So, it would be really helpful if anyone can explain me what actually is a sequence file and where it is used in hadoop?
Apr 5, 2018 in Big Data Hadoop by Damon Salvatore
• 5,250 points
380 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Sequence files are binary files containing serialized key/value pairs. You can compress a sequence file at the record (key-value pair) or block levels. This is one of the advantage of using sequence file. Also, sequebce files are binary files, they provide faster read/write than that of text file format.

Problem With Small Files and Hadoop

Now, one of the main problem that sequence file format solves is the problem of processing too many small files in Hadoop. As you know Hadoop is not good for processing large number of small files as referencing (memory) large amounts of small files generates a lot of overhead for the namenode. Besides this memory overhead, the second problem arises in terms of number of mappers as more number of mappers will be executed for each files (as the file size is smaller than that of block).

Solution: Sequence File

Sequence files allows you to solve this problem of small files. As discussed sequence file are the files containing key-value pairs. So, you can use it to hold multiple key-value pairs where the key can be unique file metadata, like  filename+timestamp and value is the content of the ingested file. Now, this way you are  able to club too many small files as a single file and then you can use this for processing as an input for MapReduce. This is the reason why sequence files often are used in custom-written map-reduce programs.

Let me know in case you have more confusion.

answered Apr 5, 2018 by Ashish
• 2,630 points

Related Questions In Big Data Hadoop

0 votes
1 answer

What is the use of fsimage in hadoop?

The NameNode stores modifications to the file ...READ MORE

answered Dec 20, 2018 in Big Data Hadoop by Omkar
• 65,810 points
240 views
0 votes
12 answers

What is Zookeeper? What is the purpose of Zookeeper in Hadoop Ecosystem?

Hey, Apache Zookeeper says that it is a ...READ MORE

answered Apr 29 in Big Data Hadoop by Gitika
• 6,300 points
2,380 views
0 votes
1 answer

What is the slaves file configuration in Hadoop?

The main idea behind is the master ...READ MORE

answered Apr 24, 2018 in Big Data Hadoop by Shubham
• 12,030 points
354 views
0 votes
1 answer

What is the meaning of Write Ahead Log in Hadoop?

Write Ahead Log (WAL) is a file ...READ MORE

answered Nov 20, 2018 in Big Data Hadoop by Sunil
38 views
0 votes
1 answer

What is the usage of Configured class in Hadoop programs?

Configured is a default implementation of the Configurable interface - ...READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Frankie
• 9,570 points
59 views
0 votes
1 answer

What is the command to count number of lines in a file in hdfs?

hadoop fs -cat /example2/doc1 | wc -l READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Omkar
• 65,810 points
114 views
0 votes
1 answer

What is the use of parser in Apache pig?

Hey, It is correct that it comes under ...READ MORE

answered May 7 in Big Data Hadoop by Gitika
• 6,300 points
12 views
0 votes
1 answer
+1 vote
1 answer

Is it necessary to use Zookeeper in Hadoop Stack?

ZooKeeper is a centralized service for maintaining ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by Ashish
• 2,630 points
63 views
0 votes
1 answer

What Distributed Cache is actually used for in Hadoop?

Basically distributed cache allows you to cache ...READ MORE

answered Apr 2, 2018 in Big Data Hadoop by Ashish
• 2,630 points
73 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.