How to create a not null column in case class in spark

0 votes
How to create a column in case class with not null package

package com.spark.sparkpkg

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Encoders
import org.apache.log4j.Logger
import org.apache.log4j.Level
object CaseClassSample extends App{
  Logger.getLogger("org").setLevel(Level.OFF)
  val spark= SparkSession.builder().master("local[*]").appName("caseClass").getOrCreate()
  
  import spark.implicits._
  case class test(empid :  String , userName : String)
  val ds=spark.read
             .option("header","true")
             .csv("C:/Users/dkumar77.EAD/Desktop/SparkData/11.csv").as[test]
   ds.show()
   ds.printSchema()

}

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
+-----+--------+
|empid|userName|
+-----+--------+
|    1|  Deepak|
| null|    Test|
+-----+--------+

root
 |-- empid: string (nullable = true) ****** this should be nullable=false
 |-- userName: string (nullable = true)

how can we do this. I tried few thing Option[String] but did not worked. Can you please help
May 14, 2020 in Apache Spark by Deepak
• 120 points
4,989 views

1 answer to this question.

0 votes

Hi@Deepak,

In your test class you passed empid as string, that's why it shows nullable=true. So you have to import the below package.

import org.apache.spark.sql.types

You can use these kind of codes in your program.

df.withColumn("empid", $"empid".cast(IntegerType))
df.withColumn("username", $"username".cast(StringType))
answered May 14, 2020 by MD
• 95,460 points

Related Questions In Apache Spark

0 votes
1 answer
+1 vote
1 answer
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,413 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points
3,425 views
0 votes
1 answer

How to restrict a group to only view in Spark?

You can do it dynamically be setting ...READ MORE

answered Mar 15, 2019 in Apache Spark by Raj
615 views
0 votes
1 answer

How to check if a particular keyword exists in Apache Spark?

Hey, You can try this code to get ...READ MORE

answered Jul 23, 2019 in Apache Spark by Gitika
• 65,890 points
4,813 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 65,890 points
9,490 views
0 votes
1 answer

How to create paired RDD using subString method in Spark?

Hi, If you have a file with id ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 65,890 points
2,684 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 88,646 views
+1 vote
8 answers

How to replace null values in Spark DataFrame?

Hi, In Spark, fill() function of DataFrameNaFunctions class is used to replace ...READ MORE

answered Dec 15, 2020 in Apache Spark by MD
• 95,460 points
75,300 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP