How to remove NA values with dplyr::filter()

0 votes

Below is the code:

library(tidyverse)
df <- tibble(
    ~col1, ~col2, ~col3,
    1, 2, 3, 
    1, NA, 3, 
    NA, 2, 3
)

I can remove all NA observations with drop_na():

df %>% drop_na()

Or remove all NA observations in a single column (col1 for example):

df %>% drop_na(col1)

Why can't I just use a regular != filter pipe?

df %>% filter(col1 != NA)

Why do we have to use a special function from tidyr to remove NAs?

Apr 3, 2018 in Data Analytics by DataKing99
• 8,100 points
3,306 views

3 answers to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes
This has nothing to do specifically with dplyr::filter. But, any comparison with NA, including NA==NA will return NA.

R does not know about what you are doing in your analysis.

So, basically it does not allow comparison operators to think NA as a value.
answered Apr 3, 2018 by kappa3010
• 2,010 points
0 votes

Try this:

df %>% filter(!is.na(col1))
answered Mar 26 by anonymous
Thanks, that worked :)
0 votes
Null values have no notion of equality in R. Therefore, NA == NA just returns NA. In fact, NA compared to any object in R will return NA. The filter statement in dplyr requires a boolean argument, so when it is iterating through col1, checking for inequality with filter(col1 != NA), the 'col1 != NA' command is continually throwing NA values for each row of col1. This is not a boolean, so the filter command does not evaluate properly.
answered Apr 11 by Zane
Thanks Zane! That was very well explained.

Related Questions In Data Analytics

0 votes
1 answer

How to use dplyr functions such as filter() inside nested data frames with map()

You can use map() call as follows:  map(full, ...READ MORE

answered Apr 6, 2018 in Data Analytics by darklord
• 6,140 points
103 views
0 votes
1 answer

How to replace NA values in a dataframe with Zero's ?

It is simple and easy: df1<-as.data.frame(matrix(sample(c(NA, 1:10), 100, ...READ MORE

answered Apr 10, 2018 in Data Analytics by CodingByHeart77
• 3,680 points
75 views
0 votes
1 answer

How to remove rows with missing values (NAs) in a data frame?

You can use complete.cases in the following ...READ MORE

answered Apr 13, 2018 in Data Analytics by darklord
• 6,140 points
3,276 views
0 votes
2 answers

How to remove NA values from a Vector in R?

data = data[!is.na(data)]\ READ MORE

answered Apr 5 in Data Analytics by anonymous
3,957 views
0 votes
1 answer
0 votes
1 answer

How can I use parallel so that it preserves the list of data frames

You can use pmap as follows: nc <- ...READ MORE

answered Apr 4, 2018 in Data Analytics by kappa3010
• 2,010 points
20 views
0 votes
1 answer
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 5, 2018 in Data Analytics by kappa3010
• 2,010 points
40 views
0 votes
1 answer

How to subset rows containing NA in a chosen column of a data frame?

I would suggest you, to never to ...READ MORE

answered Apr 26, 2018 in Data Analytics by kappa3010
• 2,010 points
61 views
0 votes
1 answer

How to print new lines with print() in R?

You can use cat() instead of writeLines(): ...READ MORE

answered May 3, 2018 in Data Analytics by kappa3010
• 2,010 points
15 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.