Is there any efficient way of dealing null values during concat functionality of pyspark sql version 2 3 4

+1 vote
Nov 5, 2019 in Apache Spark by aizhar
• 130 points
42,159 views

1 answer to this question.

+1 vote

When you concatenate any string with a NULL value, it will result in NULL. To avoid this, you can use the COALESCE function. 

spark.sql(SELECT COALESCE(Name, '') + ' '+ COALESCE(Column2, '') AS Result FROM table_test).show()

The COALESCE function returns the first non-Null value. So, when there is a value in the column that is not null, that will be concatenated. And if the value in the column is null, then an empty string will be concatenated.

After that it will work.

To know more about Pyspark, it's recommended that you join Pyspark Training today.

Thanks.

answered Nov 6, 2019 by Rishi