How to convert .txt file to Hadoop's sequence file format

0 votes

I need data to be stored in hadoop's sequence file format. However,currently the data is only in flat .txt format.Can anyone suggest a way i can convert a .txt file to a sequence file?

Sep 28, 2018 in Big Data Hadoop by digger
• 26,600 points
885 views

2 answers to this question.

0 votes

So the way more simplest answer is just an "identity" job that has a SequenceFile output.

Looks like this in java:

    public static void main(String[] args) throws IOException,
        InterruptedException, ClassNotFoundException {

    Configuration conf = new Configuration();
    Job job = new Job(conf);
    job.setJobName("Convert Text");
    job.setJarByClass(Mapper.class);

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    // increase if you need sorting or a special number of files
    job.setNumReduceTasks(0);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(Text.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    job.setInputFormatClass(TextInputFormat.class);

    TextInputFormat.addInputPath(job, new Path("/lol"));
    SequenceFileOutputFormat.setOutputPath(job, new Path("/lolz"));

    // submit and wait for completion
    job.waitForCompletion(true);
   }
answered Sep 28, 2018 by slayer
• 29,240 points
0 votes
import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;

//White, Tom (2012-05-10). Hadoop: The Definitive Guide (Kindle Locations 5375-5384). OReilly Media - A. Kindle Edition. 

public class SequenceFileWriteDemo { 

    private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door", "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" };

    public static void main( String[] args) throws IOException { 
        String uri = args[ 0];
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(URI.create( uri), conf);
        Path path = new Path( uri);
        IntWritable key = new IntWritable();
        Text value = new Text();
        SequenceFile.Writer writer = null;
        try { 
            writer = SequenceFile.createWriter( fs, conf, path, key.getClass(), value.getClass());
            for (int i = 0; i < 100; i ++) { 
                key.set( 100 - i);
                value.set( DATA[ i % DATA.length]);
                System.out.printf("[% s]\t% s\t% s\n", writer.getLength(), key, value); 
                writer.append( key, value); } 
        } finally 
        { IOUtils.closeStream( writer); 
        } 
    } 
}
answered Oct 12, 2018 by Sanjay

Related Questions In Big Data Hadoop

0 votes
1 answer

How to create a Hive table with a sequence file?

In Hive we can create a sequence ...READ MORE

answered Dec 17, 2018 in Big Data Hadoop by Omkar
• 68,480 points
606 views
0 votes
1 answer

How to create a Hive table from sequence file stored in HDFS?

There are two SerDe for SequenceFile as ...READ MORE

answered Dec 17, 2018 in Big Data Hadoop by Omkar
• 68,480 points
715 views
0 votes
1 answer

How to change file format using Sqoop?

For change in the file format, you ...READ MORE

answered Dec 17, 2018 in Big Data Hadoop by Omkar
• 68,480 points
318 views
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto
6,603 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,800 points
3,573 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,800 points
453 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
18,416 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,280 points
1,332 views
0 votes
1 answer

How to exit name node from safe mode?

In order to forcefully let the namenode ...READ MORE

answered Sep 26, 2018 in Big Data Hadoop by slayer
• 29,240 points
189 views
0 votes
1 answer

Spark - load CSV file as DataFrame?

spark-csv is part of core Spark functionality ...READ MORE

answered Sep 25, 2018 in Big Data Hadoop by slayer
• 29,240 points
3,551 views