How do I get number of columns in each line from a delimited file??

0 votes

When I execute the below in the spark shell, I'm expecting the file content is split based on "\n" and put in lines.

val lines = sc.textFile("/user/test.txt").map(l => l.split("\n"));

When I do a collect on lines like this 

lines.collect()

The output is as below

scala> lines.collect()
res76: Array[Array[String]] = Array(Array(~@00~@51~@DCS~@000009746~@1~@20190116~@170106), Array(~@51~@00~@1~@4397537~@3~@1~@1~@11~@16607475037~@272~@1521~@0~@0~@9~@AB2111756~@37~@20190112~@162954~@00000000~@1~@2000176746~@1~@88918773002073~@1~@3~@0~@0~@1~@008~@1~@889~@1~@000~@0~@0~@04), Array(~@51~@00~@1~@4397611~@3~@1~@1~@11~@16607475037~@272~...
scala>

Each line in the file is displayed as an array of arrays??? 

Now i need to know the number of column in each line delimited with '~@'

How do I do this???

Mar 8 in Apache Spark by Vijay Dixon
• 180 points
347 views

1 answer to this question.

+1 vote
Instead of spliting on '\n'. You should define a case class for each fields in a line,

follow below steps sequentially:

use sc.textfile to create an rdd of the file.

call Map tranformation on top of rdd, within map transformation split it on '~' and bind it with each of the fields defined in the case class.
answered Aug 7 by ashish

Related Questions In Apache Spark

0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
1,175 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,240 points
171 views
0 votes
1 answer
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
327 views
0 votes
1 answer
0 votes
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6 in Apache Spark by Gitika
• 25,340 points
88 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,686 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
282 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,373 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
980 views