How to create a not null column in case class in spark

0 votes
How to create a column in case class with not null package

package com.spark.sparkpkg

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Encoders
import org.apache.log4j.Logger
import org.apache.log4j.Level
object CaseClassSample extends App{
  Logger.getLogger("org").setLevel(Level.OFF)
  val spark= SparkSession.builder().master("local[*]").appName("caseClass").getOrCreate()
  
  import spark.implicits._
  case class test(empid :  String , userName : String)
  val ds=spark.read
             .option("header","true")
             .csv("C:/Users/dkumar77.EAD/Desktop/SparkData/11.csv").as[test]
   ds.show()
   ds.printSchema()

}

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
+-----+--------+
|empid|userName|
+-----+--------+
|    1|  Deepak|
| null|    Test|
+-----+--------+

root
 |-- empid: string (nullable = true) ****** this should be nullable=false
 |-- userName: string (nullable = true)

how can we do this. I tried few thing Option[String] but did not worked. Can you please help
May 14 in Apache Spark by Deepak
• 120 points
330 views

1 answer to this question.

0 votes

Hi@Deepak,

In your test class you passed empid as string, that's why it shows nullable=true. So you have to import the below package.

import org.apache.spark.sql.types

You can use these kind of codes in your program.

df.withColumn("empid", $"empid".cast(IntegerType))
df.withColumn("username", $"username".cast(StringType))
answered May 14 by MD
• 55,740 points

Related Questions In Apache Spark

0 votes
1 answer
0 votes
12 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 56,895 views
+1 vote
1 answer
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,450 points
3,879 views
0 votes
7 answers
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
1,274 views
0 votes
1 answer

How to restrict a group to only view in Spark?

You can do it dynamically be setting ...READ MORE

answered Mar 15, 2019 in Apache Spark by Raj
99 views
0 votes
1 answer

How to check if a particular keyword exists in Apache Spark?

Hey, You can try this code to get ...READ MORE

answered Jul 22, 2019 in Apache Spark by Gitika
• 36,930 points
532 views
0 votes
1 answer

How to create multiple producers in apache kafka?

Hi@akhtar, To create multiple producer you have to ...READ MORE

answered Feb 6 in Apache Spark by MD
• 55,740 points
447 views
0 votes
1 answer

How to parse a textFile to csv in pyspark?

Hi, Use this below given code, it will ...READ MORE

answered Apr 13 in Apache Spark by MD
• 55,740 points
358 views