Is there any efficient way of dealing null values during concat functionality of pyspark sql version 2 3 4

Question

As you can see in S.S if any attribute has a null value in a table then concatenated result become null but in SQL result is nonullcol + nullcol = nonullcol while in spark it is giving me null, suggest me any solution for this problem. Thanks in advance

MD · Answer 1 · Nov 6, 2019

When you concatenate any string with a NULL value, it will result in NULL. To avoid this, you can use the COALESCE function.

spark.sql(SELECT COALESCE(Name, '') + ' '+ COALESCE(Column2, '') AS Result FROM table_test).show()

The COALESCE function returns the first non-Null value. So, when there is a value in the column that is not null, that will be concatenated. And if the value in the column is null, then an empty string will be concatenated.

After that it will work.

To know more about Pyspark, it's recommended that you join Pyspark Training today.

Thanks.

answered Nov 6, 2019 by Rishi

Can you please suggest me how can I concatenate a date column if it is having null value?

Thanks in Advance.

Pravin

commented Sep 10, 2020 by Pravin

Hi@Pravin,

You can replace your null values with some significant value maybe 0. In this way, you can avoid this null value problem. You can also see the below example.

df\
.withColumn('Created-formatted',when((df.Created.isNull() | (df.Created == '')) ,'0')\
.otherwise(unix_timestamp(df.Created,'yyyy-MM-dd')))\
.withColumn('EventDate-formatted',when((df.EventDate.isNull() | (df.EventDate == '')) ,'0')\
.otherwise(unix_timestamp(df.EventDate,'yyyy-MM-dd')))\
.drop('Created','EventDate')\
.show()

But before that check the format of your dataset and set accordingly.