Delimiter on the data

Question

I have a file with records as below.

s.no,name,Country
101,Raju,India,IN
102,Reddy,UnitedStates,US

here the my country column has data as "India,IN" which is single value and it has comma as well. Can you let me know how to handle this data when we read the file using comma delimiter in spark-scala? I tried with split(",") which did not give me expected output.

for ex: expected output for the first record:

S.no: 101
name: Raju
Country: India,IN

score 0 · Answer 1 · Jul 25, 2019

You can use this:

import org.apache.spark.sql.functions.struct

val df = Seq((1,2), (3,4), (5,3)).toDF("a", "b")

val new = df.withColumn("NewColumn", struct(df("a"), df("b"))

new.show()


+---+---+---------+

|a |b |NewColumn|

+---+---+---------+

|1 |2 |[1,2] |

|3 |4 |[3,4] |

|5 |3 |[5,3] |

+---+---+---------+


val data = new.drop("a");

val data = data.drop("b");