How to remove NA values with dplyr::filter()

0 votes

Below is the code:

library(tidyverse)
df <- tibble(
    ~col1, ~col2, ~col3,
    1, 2, 3, 
    1, NA, 3, 
    NA, 2, 3
)

I can remove all NA observations with drop_na():

df %>% drop_na()

Or remove all NA observations in a single column (col1 for example):

df %>% drop_na(col1)

Why can't I just use a regular != filter pipe?

df %>% filter(col1 != NA)

Why do we have to use a special function from tidyr to remove NAs?

Apr 3, 2018 in Data Analytics by DataKing99
• 8,210 points
79,399 views

4 answers to this question.

0 votes
This has nothing to do specifically with dplyr::filter. But, any comparison with NA, including NA==NA will return NA.

R does not know about what you are doing in your analysis.

So, basically it does not allow comparison operators to think NA as a value.
answered Apr 3, 2018 by kappa3010
• 2,090 points
0 votes

Try this:

df %>% filter(!is.na(col1))
answered Mar 26, 2019 by anonymous
Thanks, that worked :)
This was simple, direct and perfect...thank you!
Thanks for your contribution!

In case you found the answer helpful do upvote the answer and increase your points!

Cheers!!!
What if we have 2 columns with possible na rows?
0 votes
Null values have no notion of equality in R. Therefore, NA == NA just returns NA. In fact, NA compared to any object in R will return NA. The filter statement in dplyr requires a boolean argument, so when it is iterating through col1, checking for inequality with filter(col1 != NA), the 'col1 != NA' command is continually throwing NA values for each row of col1. This is not a boolean, so the filter command does not evaluate properly.
answered Apr 11, 2019 by Zane
Thanks Zane! That was very well explained.
0 votes
Can we create a alist as below to find the rows which has no null values and then provide the list to dplyr function to filter rows with positions values??

Here na.omit(airquality$Ozone) will have values of not null values.

Then provide the list of positions to filter function?
answered Aug 5, 2019 by anonymous

Related Questions In Data Analytics

0 votes
1 answer

How to use dplyr functions such as filter() inside nested data frames with map()

You can use map() call as follows:  map(full, ...READ MORE

answered Apr 6, 2018 in Data Analytics by Sahiti
• 6,320 points
1,654 views
0 votes
1 answer

How to replace NA values in a dataframe with Zero's ?

It is simple and easy: df1<-as.data.frame(matrix(sample(c(NA, 1:10), 100, ...READ MORE

answered Apr 10, 2018 in Data Analytics by CodingByHeart77
• 3,720 points
636 views
0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20, 2019 in Data Analytics by anonymous
• 32,490 points
12,179 views
0 votes
4 answers

How to remove NA values from a Vector in R?

You can try na.omit() or na.exclude() too. ...READ MORE

answered Jul 31, 2019 in Data Analytics by anonymous
• 32,490 points
64,978 views
0 votes
1 answer
0 votes
1 answer

How can I use parallel so that it preserves the list of data frames

You can use pmap as follows: nc <- ...READ MORE

answered Apr 4, 2018 in Data Analytics by kappa3010
• 2,090 points
127 views
0 votes
1 answer
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 5, 2018 in Data Analytics by kappa3010
• 2,090 points
378 views
0 votes
2 answers

How to subset rows containing NA in a chosen column of a data frame?

You can give this a try. subset(dataframe, is.na(dataframe$col2)) ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 32,490 points
3,139 views
0 votes
1 answer

How to print new lines with print() in R?

You can use cat() instead of writeLines(): ...READ MORE

answered May 3, 2018 in Data Analytics by kappa3010
• 2,090 points
98 views