How to replace null values in Spark DataFrame

Question

Announcement! Career Guide 2019 is out now. Explore careers to become a Big Data Developer or Architect!

I want to remove null values from a csv file. So tried the following things.

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("/usr/local/spark/cars.csv")

After loading the file it looks like as shown below. Now, I want to remove null values.

So, I do this :

df.na.fill("e",Seq("blank"))
But the null values didn't change.Can anyone help me?

nitinrawat895 · Answer 1 · May 31, 2018

This is basically very simple. You'll need to create a new DataFrame. I'm using the DataFrame df that you have defined earlier.

val newDf = df.na.fill("e",Seq("blank"))

DataFrames are immutable structures. Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrame to a new value.

You can even check out the details of a successful Spark developers with the Pyspark training course.

answered May 31, 2018 by nitinrawat895
• 11,380 points

score 0 · Answer 2 · Dec 10, 2018

val map = Map("comment" -> "a", "blank" -> "a2")

df.na.fill(map).show()

answered Dec 10, 2018 by Sute

score 0 · Answer 3 · Dec 10, 2018

df1 = df.na().fill("e",Seq("blank"));

answered Dec 10, 2018 by Shanti

score 0 · Answer 4 · Dec 10, 2018

String[] colNames = {"NameOfColumn"}
dataframe = dataframe.na.fill("ValueToBeFilled", colNames)

answered Dec 10, 2018 by Sada

score 0 · Answer 5 · Dec 10, 2018

def isEvenOption(n: Integer): Option[Boolean] = {
  val num = Option(n).getOrElse(return None)
  Some(num % 2 == 0)
}

val isEvenOptionUdf = udf[Option[Boolean], Integer](isEvenOption)

Source: Dealing with null in Spark

answered Dec 10, 2018 by Mohan

For ,we have to use, drop()

DF.na.drop()
.show(false)

drop() will remove all the null from the DF

commented Mar 11, 2020 by Spark_Kumar

Srinivasreddy · Answer 6 · Feb 5, 2019

Hi i hope this will help for you.

option("nullValue","defaultvalue")

val df = sqlContext.read.format("com.databricks.spark.csv").option("nullValue","defaultvalue").option("header", "true").load("/usr/local/spark/cars.csv"

answered Feb 5, 2019 by Srinivasreddy
• 140 points

Is a closed parenthesis missing at the end of the command?

commented Feb 7, 2019 by Kunal

Sir, Can you please explain this code?

commented Feb 7, 2019 by Komal

MD · Answer 7 · Dec 15, 2020

Hi,

In Spark, fill() function of DataFrameNaFunctions class is used to replace NULL values on the DataFrame column with either zero(0), empty string, space, or any constant literal values.

//Replace all integer and long columns
df.na.fill(0)
    .show(false)

//Replace with specific columns
df.na.fill(0,Array("population"))
  .show(false)