How do I get number of columns in each line from a delimited file??

0 votes

When I execute the below in the spark shell, I'm expecting the file content is split based on "\n" and put in lines.

val lines = sc.textFile("/user/test.txt").map(l => l.split("\n"));

When I do a collect on lines like this 

lines.collect()

The output is as below

scala> lines.collect()
res76: Array[Array[String]] = Array(Array(~@00~@51~@DCS~@000009746~@1~@20190116~@170106), Array(~@51~@00~@1~@4397537~@3~@1~@1~@11~@16607475037~@272~@1521~@0~@0~@9~@AB2111756~@37~@20190112~@162954~@00000000~@1~@2000176746~@1~@88918773002073~@1~@3~@0~@0~@1~@008~@1~@889~@1~@000~@0~@0~@04), Array(~@51~@00~@1~@4397611~@3~@1~@1~@11~@16607475037~@272~...
scala>

Each line in the file is displayed as an array of arrays??? 

Now i need to know the number of column in each line delimited with '~@'

How do I do this???

Mar 8 in Apache Spark by Vijay Dixon
• 180 points
38 views

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.

Related Questions In Apache Spark

0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,150 points
635 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,260 points
58 views
0 votes
1 answer
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,570 points
120 views
0 votes
1 answer
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,260 points
734 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
1,654 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
130 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
8,017 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
560 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.