How to remove NA values with dplyr filter

0 votes

Below is the code:

library(tidyverse)
df <- tibble(
    ~col1, ~col2, ~col3,
    1, 2, 3, 
    1, NA, 3, 
    NA, 2, 3
)

I can remove all NA observations with drop_na():

df %>% drop_na()

Or remove all NA observations in a single column (col1 for example):

df %>% drop_na(col1)

Why can't I just use a regular != filter pipe?

df %>% filter(col1 != NA)

Why do we have to use a special function from tidyr to remove NAs?

Apr 3, 2018 in Data Analytics by DataKing99
• 8,240 points
90,848 views

5 answers to this question.

0 votes
This has nothing to do specifically with dplyr::filter. But, any comparison with NA, including NA==NA will return NA.

R does not know about what you are doing in your analysis.

So, basically it does not allow comparison operators to think NA as a value.
answered Apr 3, 2018 by kappa3010
• 2,090 points
0 votes

Try this:

df %>% filter(!is.na(col1))
answered Mar 26, 2019 by anonymous
Thanks, that worked :)
This was simple, direct and perfect...thank you!
Thanks for your contribution!

In case you found the answer helpful do upvote the answer and increase your points!

Cheers!!!
What if we have 2 columns with possible na rows?
0 votes
Null values have no notion of equality in R. Therefore, NA == NA just returns NA. In fact, NA compared to any object in R will return NA. The filter statement in dplyr requires a boolean argument, so when it is iterating through col1, checking for inequality with filter(col1 != NA), the 'col1 != NA' command is continually throwing NA values for each row of col1. This is not a boolean, so the filter command does not evaluate properly.
answered Apr 11, 2019 by Zane
Thanks Zane! That was very well explained.
0 votes
Can we create a alist as below to find the rows which has no null values and then provide the list to dplyr function to filter rows with positions values??

Here na.omit(airquality$Ozone) will have values of not null values.

Then provide the list of positions to filter function?
answered Aug 5, 2019 by anonymous
0 votes

Hi,

The dplyr has ’filter()’ function to do such filtering, but there is even more. With dplyr you can do the kind of filtering, which could be hard to perform or complicated to construct with tools like SQL and traditional BI tools, in such a simple and more intuitive way. For example, we have one flight dataset and removing NA values with the filter keyword.

flight %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME) %>%
filter(!is.na(ARR_DELAY))
answered Dec 10, 2020 by MD
• 95,010 points

Related Questions In Data Analytics

0 votes
1 answer

How to use dplyr functions such as filter() inside nested data frames with map()

You can use map() call as follows:  map(full, ...READ MORE

answered Apr 6, 2018 in Data Analytics by Sahiti
• 6,360 points
1,829 views
0 votes
1 answer

How to replace NA values in a dataframe with Zero's ?

It is simple and easy: df1<-as.data.frame(matrix(sample(c(NA, 1:10), 100, ...READ MORE

answered Apr 10, 2018 in Data Analytics by CodingByHeart77
• 3,720 points
691 views
0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20, 2019 in Data Analytics by anonymous
• 32,890 points
12,452 views
0 votes
5 answers

How to remove NA values from a Vector in R?

Hello team, you can use na.omit x <- c(NA, 3, ...READ MORE

answered Dec 9, 2020 in Data Analytics by anonymous
• 82,540 points
75,469 views
0 votes
1 answer
0 votes
1 answer

How can I use parallel so that it preserves the list of data frames

You can use pmap as follows: nc <- ...READ MORE

answered Apr 4, 2018 in Data Analytics by kappa3010
• 2,090 points
145 views
0 votes
1 answer
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 5, 2018 in Data Analytics by kappa3010
• 2,090 points
424 views
0 votes
2 answers

How to subset rows containing NA in a chosen column of a data frame?

You can give this a try. subset(dataframe, is.na(dataframe$col2)) ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 32,890 points
3,755 views
0 votes
1 answer

How to print new lines with print() in R?

You can use cat() instead of writeLines(): ...READ MORE

answered May 3, 2018 in Data Analytics by kappa3010
• 2,090 points
117 views