How do you load this multiline data in spark as a single record

tsv File data sample below.

Question:
Attempting to load a tsv file, sample data indicated below into spark. The issue is that the data is split into two lines within the CallerAddress field, which is enclosed in double quotes. Notice the initial double quote, as in "101 ... and the next line with the ending double quote as in, STE 3305" . How do you load this in spark as a single record?

---------------------------------------------

CID CallLocation CallerLocation CallerAddress CallerCity CallerState CallerZip CallDateUTC Status CallDuration DateKey

211258030 GA, ATLANTA ATLANTA, GA "101 MARIETTA ST NW

STE 3305" ATLANTA GA 30303 2020-11-06 14:49:19 Answered 180 20201106

Nov 21, 2020 in Apache Spark by Ruben
• 180 points • 2,540 views

1 answer to this question.

Hi@Ruben,

I think you can add an escape option to get this working properly. Add the time of reading the file you can add this option.

answered Nov 23, 2020 by MD
• 95,460 points

Related Questions In Apache Spark

+1 vote

1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 65,730 points • 5,312 views

+1 vote

1 answer

How to assign a column in Spark Dataframe (PySpark) as a Primary Key?

spark do not have any concept of ...READ MORE

answered Jan 12, 2020 in Apache Spark by Sirish
• 160 points • 15,184 views

0 votes

2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Dhara dhruve
• 6,563 views

+1 vote

1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points • 8,864 views

0 votes

1 answer

How to find the number of elements present in the array in a Spark DataFame column?

You can select the column and apply ...READ MORE

answered Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points • 22,961 views

0 votes

1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

answered Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 1,716 views

0 votes

1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

answered Jul 26, 2018 in Apache Spark by zombie
• 3,790 points • 1,916 views

0 votes

1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points • 3,741 views

0 votes

1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

answered May 14, 2020 in Apache Spark by MD
• 95,460 points • 5,574 views

0 votes

1 answer

3)You have a dataset of in-game purcahses from mobile game users and you want to group these users for upsell. which one of the spark machine learning algorithms could you use ?

linear regression READ MORE

answered Jan 31, 2024 in Apache Spark by b

edited Mar 5 • 4,201 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP