How do I get number of columns in each line from a delimited file

When I execute the below in the spark shell, I'm expecting the file content is split based on "\n" and put in lines.

val lines = sc.textFile("/user/test.txt").map(l => l.split("\n"));

When I do a collect on lines like this

lines.collect()

The output is as below

scala> lines.collect()
res76: Array[Array[String]] = Array(Array(~@00~@51~@DCS~@000009746~@1~@20190116~@170106), Array(~@51~@00~@1~@4397537~@3~@1~@1~@11~@16607475037~@272~@1521~@0~@0~@9~@AB2111756~@37~@20190112~@162954~@00000000~@1~@2000176746~@1~@88918773002073~@1~@3~@0~@0~@1~@008~@1~@889~@1~@000~@0~@0~@04), Array(~@51~@00~@1~@4397611~@3~@1~@1~@11~@16607475037~@272~...
scala>

Each line in the file is displayed as an array of arrays???

Now i need to know the number of column in each line delimited with '~@'

How do I do this???

Mar 8, 2019 in Apache Spark by Vijay Dixon
• 190 points • 6,873 views

2 answers to this question.

Instead of spliting on '\n'. You should define a case class for each fields in a line,

follow below steps sequentially:

use sc.textfile to create an rdd of the file.

call Map tranformation on top of rdd, within map transformation split it on '~' and bind it with each of the fields defined in the case class.

answered Aug 7, 2019 by ashish

You can use df.foreach().println

answered Apr 4, 2020 by SaiSowhit

Related Questions In Apache Spark

+1 vote

1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points • 9,335 views

0 votes

1 answer

How to index one csv file with no header , after converting the csv to a dataframe, i need to name the columns in order to normalize in minmaxScaler.

Hi@Manas, You can read your dataset from CSV ...READ MORE

answered Sep 10, 2020 in Apache Spark by MD
• 95,460 points • 3,135 views

0 votes

1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 2,915 views

0 votes

1 answer

How to find the number of elements present in the array in a Spark DataFame column?

You can select the column and apply ...READ MORE

answered Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points • 23,410 views

0 votes

1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points • 4,048 views

0 votes

1 answer

where can i get spark-terasort.jar and not .scala file, to do spark terasort in windows.

Hi! I found 2 links on github where ...READ MORE

answered Feb 13, 2019 in Apache Spark by Omkar
• 69,180 points • 1,797 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 13,557 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 4,454 views

+2 votes

11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 116,577 views

–1 vote

1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points • 6,622 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP