Setting textinputformat.record.delimiter in spark

0 votes

In Spark, it is possible to set some hadoop configuration settings like, e.g.

System.setProperty("spark.hadoop.dfs.replication", "1")

This works, the replication factor is set to 1. Assuming that this is the case, I thought that this pattern (prepending "spark.hadoop." to a regular hadoop configuration property), would also work for the textinputformat.record.delimiter:

System.setProperty("spark.hadoop.textinputformat.record.delimiter", "\n\n")

However, it seems that spark just ignores this setting. Do I set the textinputformat.record.delimiter in the correct way? Is there a simpler way of setting the textinputformat.record.delimiter. I would like to avoid writing my own InputFormat, since I really only need to obtain records delimited by two newlines.

Oct 10, 2018 in Big Data Hadoop by slayer
• 29,300 points

1 answer to this question.

0 votes

I got this working with plain uncompressed files with the below function.

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat

def nlFile(path: String) = {
    val conf = new Configuration
    conf.set("textinputformat.record.delimiter", "\n")
    sc.newAPIHadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], conf)
answered Oct 10, 2018 by Omkar
• 69,030 points

Related Questions In Big Data Hadoop

0 votes
1 answer

What is the command to check the number of cores in Spark?

Go to your Spark Web UI & ...READ MORE

answered May 16, 2018 in Big Data Hadoop by Shubham
• 13,450 points
0 votes
1 answer

What is the Data format and database choices in Hadoop and Spark?

Use Parquet. I'm not sure about CSV ...READ MORE

answered Sep 4, 2018 in Big Data Hadoop by Frankie
• 9,810 points
0 votes
1 answer

Where can I find logs in Spark on YARN?

You can access logs through the command yarn ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 69,030 points
0 votes
1 answer

Hadoop Spark: What is version to find SparkSession in library Spark?

you need both core and SQL artifacts <repositories> ...READ MORE

answered Nov 13, 2018 in Big Data Hadoop by Omkar
• 69,030 points
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,950 points
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyF ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,320 points
0 votes
3 answers

Spark Scala: How to list all folders in directory

val spark = SparkSession.builder().appName("Demo").getOrCreate() val path = new ...READ MORE

answered Dec 4, 2018 in Big Data Hadoop by Mark
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 69,030 points