How to join columns that are non-NA?

0 votes

Consider a dataset with NAs in few rows and for some columns:

data <- data.table(ID=c(1, 2, 1:3), X=c(NA, NA, 1, NA, 3), Y=c(4, 5, NA, 5, 6), Z=c(7, 8, NA, NA, 9))
data
#    ID  X  Y  Z
# 1:  1 NA  4  7
# 2:  2 NA  5  8
# 3:  1  1 NA NA
# 4:  2 NA  5 NA
# 5:  3  3  6  9

Reference table is as follows:

ref <- data.table(ID=c(1, 1:3), X=c(1, 1:3), Y=c(1, 4:6), Z=c(1, 7, NA, 9), VALUE1=c(111, 101:103), VALUE2=c(112, 104:106))
ref
#    ID X Y  Z VALUE1 VALUE2
# 1:  1 1 1  1 111  112
# 2:  1 1 4  7 101  104
# 3:  2 2 5 NA 102  105
# 4:  3 3 6  9 103  106

I want to perform left join data on "data" with "reference table" using only non-NA columns for each row:

My desired output is as follows:

   ID  X  Y  Z VALUE1 VALUE2
1:  1 NA  4  7 101  104

2:  2 NA  5  8  NA   NA

3:  1  1 NA NA 111  112
4:  1  1 NA NA 101  104

5:  2 NA  5 NA 102  105

6:  3  3  6  9 103  106

How do I achieve this?

Apr 18, 2018 in Data Analytics by DataKing99
• 8,130 points
20 views

1 answer to this question.

0 votes

You can match on something like X=X OR is.na(X).

However for complicated merge situations, you can use sqldf  

Below is my code:

library(sqldf)
sqldf("SELECT l.*, r.VALUE1, r.VALUE2
       FROM       data as l
       LEFT JOIN  ref as r
       ON         l.ID = r.ID AND (l.X = r.X OR l.X IS NULL)
                  AND (l.Y = r.Y OR l.Y IS NULL)
                  AND (l.Z = r.Z OR l.Z IS NULL)
                  AND (l.X IS NOT NULL OR l.Y IS NOT NULL OR l.Z IS NOT NULL)")

#  ID  X  Y  Z VALUE1 VALUE2
#1  1 NA  4  7 101  104
#2  2 NA  5  8  NA   NA
#3  1  1 NA NA 111  112
#4  1  1 NA NA 101  104
#5  2 NA  5 NA 102  105
#6  3  3  6  9 103  106

NOTE: The last condition insures that if all of your X, Y, Z are NA then it won't match any rows.

answered Apr 18, 2018 by darklord
• 6,170 points

Related Questions In Data Analytics

0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 5, 2018 in Data Analytics by kappa3010
• 2,020 points
65 views
0 votes
4 answers

How to remove NA values with dplyr::filter()

Can we create a alist as below ...READ MORE

answered Aug 5 in Data Analytics by anonymous
8,954 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

.SD in data.table in R

.SD stands for "Subset of Data.table". The ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,060 points
1,658 views
0 votes
1 answer

Big Data transformations with R

Dear Koushik, Hope you are doing great. You can ...READ MORE

answered Dec 17, 2017 in Data Analytics by Sudhir
• 1,610 points
54 views
0 votes
2 answers

Transforming a key/value string into distinct rows in R

We would start off by loading the ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,550 points
74 views
0 votes
1 answer

Finding frequency of observations in R

You can use the "dplyr" package to ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,550 points
174 views
0 votes
1 answer

How to subset data so that it contains only columns whose names match a condition

You can use grepl on the names ...READ MORE

answered Apr 26, 2018 in Data Analytics by darklord
• 6,170 points
37 views
0 votes
1 answer

How to replace NA with 0 using starts_with()

Well I could suggest various options such ...READ MORE

answered Apr 3, 2018 in Data Analytics by darklord
• 6,170 points
60 views