What is the use of sequence file in Hadoop?

0 votes
I read about sequence file format in few blogs. Since, I am still new to hadoop I am not actually able to understand what is the application or purpose of sequence files. So, it would be really helpful if anyone can explain me what actually is a sequence file and where it is used in hadoop?
Apr 5, 2018 in Big Data Hadoop by Damon Salvatore
• 5,540 points
3,652 views

1 answer to this question.

0 votes

Sequence files are binary files containing serialized key/value pairs. You can compress a sequence file at the record (key-value pair) or block levels. This is one of the advantage of using sequence file. Also, sequebce files are binary files, they provide faster read/write than that of text file format.

Problem With Small Files and Hadoop

Now, one of the main problem that sequence file format solves is the problem of processing too many small files in Hadoop. As you know Hadoop is not good for processing large number of small files as referencing (memory) large amounts of small files generates a lot of overhead for the namenode. Besides this memory overhead, the second problem arises in terms of number of mappers as more number of mappers will be executed for each files (as the file size is smaller than that of block).

Solution: Sequence File

Sequence files allows you to solve this problem of small files. As discussed sequence file are the files containing key-value pairs. So, you can use it to hold multiple key-value pairs where the key can be unique file metadata, like  filename+timestamp and value is the content of the ingested file. Now, this way you are  able to club too many small files as a single file and then you can use this for processing as an input for MapReduce. This is the reason why sequence files often are used in custom-written map-reduce programs.

Let me know in case you have more confusion.

answered Apr 5, 2018 by Ashish
• 2,630 points
In the above answer how we are clubbing the multiple small text files into a single key value sequence file, Kindly explain

Hi@sujith,

You can go through the below-given link.

https://hadooptutorial.info/merging-small-files-into-sequencefile/

Related Questions In Big Data Hadoop

0 votes
1 answer

What is the use of fsimage in hadoop?

The NameNode stores modifications to the file ...READ MORE

answered Dec 20, 2018 in Big Data Hadoop by Omkar
• 69,040 points
7,073 views
0 votes
12 answers

What is Zookeeper? What is the purpose of Zookeeper in Hadoop Ecosystem?

Hey, Apache Zookeeper says that it is a ...READ MORE

answered Apr 29, 2019 in Big Data Hadoop by Gitika
• 31,310 points
10,255 views
0 votes
1 answer

What is the slaves file configuration in Hadoop?

The main idea behind is the master ...READ MORE

answered Apr 24, 2018 in Big Data Hadoop by Shubham
• 13,380 points
1,410 views
0 votes
1 answer

What is the meaning of Write Ahead Log in Hadoop?

Write Ahead Log (WAL) is a file ...READ MORE

answered Nov 20, 2018 in Big Data Hadoop by Sunil
311 views
0 votes
1 answer

What is the usage of Configured class in Hadoop programs?

Configured is a default implementation of the Configurable interface - ...READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Frankie
• 9,810 points
345 views
0 votes
1 answer

What is the command to count number of lines in a file in hdfs?

hadoop fs -cat /example2/doc1 | wc -l READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Omkar
• 69,040 points
835 views
0 votes
1 answer

What is the use of parser in Apache pig?

Hey, It is correct that it comes under ...READ MORE

answered May 7, 2019 in Big Data Hadoop by Gitika
• 31,310 points
114 views
0 votes
1 answer

What is the use of ZooKeeper in Hbase?

Hey, The zookeeper is used to maintain the ...READ MORE

answered May 21, 2019 in Big Data Hadoop by Gitika
• 31,310 points
628 views
+1 vote
1 answer

Is it necessary to use Zookeeper in Hadoop Stack?

ZooKeeper is a centralized service for maintaining ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by Ashish
• 2,630 points
206 views
0 votes
1 answer

What Distributed Cache is actually used for in Hadoop?

Basically distributed cache allows you to cache ...READ MORE

answered Apr 2, 2018 in Big Data Hadoop by Ashish
• 2,630 points
439 views