How to join columns that are non-NA

Question

Consider a dataset with NAs in few rows and for some columns:

data <- data.table(ID=c(1, 2, 1:3), X=c(NA, NA, 1, NA, 3), Y=c(4, 5, NA, 5, 6), Z=c(7, 8, NA, NA, 9))
data
#    ID  X  Y  Z
# 1:  1 NA  4  7
# 2:  2 NA  5  8
# 3:  1  1 NA NA
# 4:  2 NA  5 NA
# 5:  3  3  6  9

Reference table is as follows:

ref <- data.table(ID=c(1, 1:3), X=c(1, 1:3), Y=c(1, 4:6), Z=c(1, 7, NA, 9), VALUE1=c(111, 101:103), VALUE2=c(112, 104:106))
ref
#    ID X Y  Z VALUE1 VALUE2
# 1:  1 1 1  1 111  112
# 2:  1 1 4  7 101  104
# 3:  2 2 5 NA 102  105
# 4:  3 3 6  9 103  106

I want to perform left join data on "data" with "reference table" using only non-NA columns for each row:

My desired output is as follows:

   ID  X  Y  Z VALUE1 VALUE2
1:  1 NA  4  7 101  104

2:  2 NA  5  8  NA   NA

3:  1  1 NA NA 111  112
4:  1  1 NA NA 101  104

5:  2 NA  5 NA 102  105

6:  3  3  6  9 103  106

How do I achieve this?

Sahiti · Answer 1 · Apr 18, 2018

You can match on something like X=X OR is.na(X).

However for complicated merge situations, you can use sqldf

Below is my code:

library(sqldf)
sqldf("SELECT l.*, r.VALUE1, r.VALUE2
       FROM       data as l
       LEFT JOIN  ref as r
       ON         l.ID = r.ID AND (l.X = r.X OR l.X IS NULL)
                  AND (l.Y = r.Y OR l.Y IS NULL)
                  AND (l.Z = r.Z OR l.Z IS NULL)
                  AND (l.X IS NOT NULL OR l.Y IS NOT NULL OR l.Z IS NOT NULL)")

#  ID  X  Y  Z VALUE1 VALUE2
#1  1 NA  4  7 101  104
#2  2 NA  5  8  NA   NA
#3  1  1 NA NA 111  112
#4  1  1 NA NA 101  104
#5  2 NA  5 NA 102  105
#6  3  3  6  9 103  106