How to create a not null column in case class in spark

0 votes
How to create a column in case class with not null package

package com.spark.sparkpkg

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Encoders
import org.apache.log4j.Logger
import org.apache.log4j.Level
object CaseClassSample extends App{
  Logger.getLogger("org").setLevel(Level.OFF)
  val spark= SparkSession.builder().master("local[*]").appName("caseClass").getOrCreate()
  
  import spark.implicits._
  case class test(empid :  String , userName : String)
  val ds=spark.read
             .option("header","true")
             .csv("C:/Users/dkumar77.EAD/Desktop/SparkData/11.csv").as[test]
   ds.show()
   ds.printSchema()

}

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
+-----+--------+
|empid|userName|
+-----+--------+
|    1|  Deepak|
| null|    Test|
+-----+--------+

root
 |-- empid: string (nullable = true) ****** this should be nullable=false
 |-- userName: string (nullable = true)

how can we do this. I tried few thing Option[String] but did not worked. Can you please help
May 14, 2020 in Apache Spark by Deepak
• 120 points
1,275 views

1 answer to this question.

0 votes

Hi@Deepak,

In your test class you passed empid as string, that's why it shows nullable=true. So you have to import the below package.

import org.apache.spark.sql.types

You can use these kind of codes in your program.

df.withColumn("empid", $"empid".cast(IntegerType))
df.withColumn("username", $"username".cast(StringType))
answered May 14, 2020 by MD
• 95,180 points

Related Questions In Apache Spark

0 votes
1 answer
+1 vote
1 answer
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,480 points
5,144 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
1,741 views
0 votes
1 answer

How to restrict a group to only view in Spark?

You can do it dynamically be setting ...READ MORE

answered Mar 15, 2019 in Apache Spark by Raj
135 views
0 votes
1 answer

How to check if a particular keyword exists in Apache Spark?

Hey, You can try this code to get ...READ MORE

answered Jul 22, 2019 in Apache Spark by Gitika
• 65,870 points
1,136 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 65,870 points
4,245 views
0 votes
1 answer

How to create paired RDD using subString method in Spark?

Hi, If you have a file with id ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 65,870 points
1,015 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 68,741 views
+1 vote
8 answers

How to replace null values in Spark DataFrame?

Hi, In Spark, fill() function of DataFrameNaFunctions class is used to replace ...READ MORE

answered Dec 15, 2020 in Apache Spark by MD
• 95,180 points
56,957 views