How do I get number of columns in each line from a delimited file

Question

When I execute the below in the spark shell, I'm expecting the file content is split based on "\n" and put in lines.

val lines = sc.textFile("/user/test.txt").map(l => l.split("\n"));

When I do a collect on lines like this

lines.collect()

The output is as below

scala> lines.collect()
res76: Array[Array[String]] = Array(Array(~@00~@51~@DCS~@000009746~@1~@20190116~@170106), Array(~@51~@00~@1~@4397537~@3~@1~@1~@11~@16607475037~@272~@1521~@0~@0~@9~@AB2111756~@37~@20190112~@162954~@00000000~@1~@2000176746~@1~@88918773002073~@1~@3~@0~@0~@1~@008~@1~@889~@1~@000~@0~@0~@04), Array(~@51~@00~@1~@4397611~@3~@1~@1~@11~@16607475037~@272~...
scala>

Each line in the file is displayed as an array of arrays???

Now i need to know the number of column in each line delimited with '~@'

How do I do this???

score +2 · Answer 1 · Aug 7, 2019

Instead of spliting on '\n'. You should define a case class for each fields in a line,

follow below steps sequentially:

use sc.textfile to create an rdd of the file.

call Map tranformation on top of rdd, within map transformation split it on '~' and bind it with each of the fields defined in the case class.