How to replace null values in Spark DataFrame?

0 votes

Announcement! Career Guide 2019 is out now. Explore careers to become a Big Data Developer or Architect!

I want to remove null values from a csv file. So tried the following things.

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("/usr/local/spark/cars.csv")

After loading the file it looks like as shown below. Now, I want to remove null values.
image

So, I do this :

df.na.fill("e",Seq("blank"))
But the null values didn't change.Can anyone help me?

May 31, 2018 in Apache Spark by kurt_cobain
• 9,240 points
13,674 views

6 answers to this question.

0 votes
This is basically very simple. You'll need to create a new DataFrame. I'm using the DataFrame df that you have defined earlier.

val newDf = df.na.fill("e",Seq("blank"))

DataFrames are immutable structures. Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrame to a new value.
answered May 31, 2018 by nitinrawat895
• 10,030 points
0 votes
val map = Map("comment" -> "a", "blank" -> "a2")

df.na.fill(map).show()
answered Dec 10, 2018 by Sute
0 votes
df1 = df.na().fill("e",Seq("blank"));
answered Dec 10, 2018 by Shanti
0 votes
String[] colNames = {"NameOfColumn"}
dataframe = dataframe.na.fill("ValueToBeFilled", colNames)
answered Dec 10, 2018 by Sada
0 votes
def isEvenOption(n: Integer): Option[Boolean] = {
  val num = Option(n).getOrElse(return None)
  Some(num % 2 == 0)
}

val isEvenOptionUdf = udf[Option[Boolean], Integer](isEvenOption)

Source: Dealing with null in Spark

answered Dec 10, 2018 by Mohan
0 votes

Hi i hope this will help for you.

option("nullValue","defaultvalue")

val df = sqlContext.read.format("com.databricks.spark.csv").option("nullValue","defaultvalue").option("header", "true").load("/usr/local/spark/cars.csv"

answered Feb 5 by Srinivasreddy
• 140 points
Is a closed parenthesis missing at the end of the command?
Sir, Can you please explain this code?

Related Questions In Apache Spark

+1 vote
1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

answered Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,240 points
271 views
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3 in Apache Spark by Omkar
• 67,120 points
91 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

answered Dec 31, 2018 in Apache Spark by anonymous
4,150 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4 in Apache Spark by Dhara dhruve
541 views
0 votes
0 answers
0 votes
1 answer

Different Spark Ecosystem

Spark has various components: Spark SQL (Shark)- for ...READ MORE

answered Jun 4, 2018 in Apache Spark by kurt_cobain
• 9,240 points
44 views
0 votes
1 answer

Minimizing Data Transfers in Spark

Minimizing data transfers and avoiding shuffling helps ...READ MORE

answered Jun 19, 2018 in Apache Spark by Data_Nerd
• 2,360 points
98 views
0 votes
2 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8 in Big Data Hadoop by Vijay Dixon
• 180 points
1,014 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,030 points
1,015 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 14,270 views