Setting textinputformat.record.delimiter in spark

0 votes

In Spark, it is possible to set some hadoop configuration settings like, e.g.

System.setProperty("spark.hadoop.dfs.replication", "1")

This works, the replication factor is set to 1. Assuming that this is the case, I thought that this pattern (prepending "spark.hadoop." to a regular hadoop configuration property), would also work for the textinputformat.record.delimiter:

System.setProperty("spark.hadoop.textinputformat.record.delimiter", "\n\n")

However, it seems that spark just ignores this setting. Do I set the textinputformat.record.delimiter in the correct way? Is there a simpler way of setting the textinputformat.record.delimiter. I would like to avoid writing my own InputFormat, since I really only need to obtain records delimited by two newlines.

Oct 10, 2018 in Big Data Hadoop by slayer
• 29,270 points
677 views

1 answer to this question.

0 votes

I got this working with plain uncompressed files with the below function.

import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat

def nlFile(path: String) = {
    val conf = new Configuration
    conf.set("textinputformat.record.delimiter", "\n")
    sc.newAPIHadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], conf)
      .map(_._2.toString)
}
answered Oct 10, 2018 by Omkar
• 69,000 points

Related Questions In Big Data Hadoop

0 votes
1 answer

What is the command to check the number of cores in Spark?

Go to your Spark Web UI & ...READ MORE

answered May 16, 2018 in Big Data Hadoop by Shubham
• 13,380 points
1,023 views
0 votes
1 answer

What is the Data format and database choices in Hadoop and Spark?

Use Parquet. I'm not sure about CSV ...READ MORE

answered Sep 4, 2018 in Big Data Hadoop by Frankie
• 9,810 points
113 views
0 votes
1 answer

Where can I find logs in Spark on YARN?

You can access logs through the command yarn ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 69,000 points
81 views
0 votes
1 answer

Hadoop Spark: What is version to find SparkSession in library Spark?

you need both core and SQL artifacts <repositories> ...READ MORE

answered Nov 13, 2018 in Big Data Hadoop by Omkar
• 69,000 points
292 views
+1 vote
1 answer
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,870 points
4,515 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
25,396 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,310 points
1,693 views
0 votes
3 answers

Spark Scala: How to list all folders in directory

val spark = SparkSession.builder().appName("Demo").getOrCreate() val path = new ...READ MORE

answered Dec 4, 2018 in Big Data Hadoop by Mark
4,099 views
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 69,000 points
3,203 views