How to remove NA values with dplyr filter

0 votes

Below is the code:

library(tidyverse)
df <- tibble(
    ~col1, ~col2, ~col3,
    1, 2, 3, 
    1, NA, 3, 
    NA, 2, 3
)

I can remove all NA observations with drop_na():

df %>% drop_na()

Or remove all NA observations in a single column (col1 for example):

df %>% drop_na(col1)

Why can't I just use a regular != filter pipe?

df %>% filter(col1 != NA)

Why do we have to use a special function from tidyr to remove NAs?

Apr 3, 2018 in Data Analytics by DataKing99
• 8,240 points
319,564 views

5 answers to this question.

0 votes

This has nothing to do specifically with dplyr::filter. But, any comparison with NA, including NA==NA will return NA.

R does not know about what you are doing in your analysis.

So, basically it does not allow comparison operators to think NA as a value.

Interested in a career in data analysis? Our Data Analyst Certification Course will equip you with the tools and techniques you need to succeed.

answered Apr 3, 2018 by kappa3010
• 2,090 points
+1 vote

Try this:

df %>% filter(!is.na(col1))
answered Mar 26, 2019 by anonymous
Thanks, that worked :)
This was simple, direct and perfect...thank you!
Thanks for your contribution!

In case you found the answer helpful do upvote the answer and increase your points!

Cheers!!!
What if we have 2 columns with possible na rows?
0 votes
Null values have no notion of equality in R. Therefore, NA == NA just returns NA. In fact, NA compared to any object in R will return NA. The filter statement in dplyr requires a boolean argument, so when it is iterating through col1, checking for inequality with filter(col1 != NA), the 'col1 != NA' command is continually throwing NA values for each row of col1. This is not a boolean, so the filter command does not evaluate properly.
answered Apr 12, 2019 by Zane
Thanks Zane! That was very well explained.
0 votes
Can we create a alist as below to find the rows which has no null values and then provide the list to dplyr function to filter rows with positions values??

Here na.omit(airquality$Ozone) will have values of not null values.

Then provide the list of positions to filter function?
answered Aug 6, 2019 by anonymous
0 votes

Hi,

The dplyr has ’filter()’ function to do such filtering, but there is even more. With dplyr you can do the kind of filtering, which could be hard to perform or complicated to construct with tools like SQL and traditional BI tools, in such a simple and more intuitive way. For example, we have one flight dataset and removing NA values with the filter keyword.

flight %>%
select(FL_DATE, CARRIER, ORIGIN, ORIGIN_CITY_NAME, ORIGIN_STATE_ABR, DEP_DELAY, DEP_TIME, ARR_DELAY, ARR_TIME) %>%
filter(!is.na(ARR_DELAY))
answered Dec 10, 2020 by MD
• 95,440 points

Related Questions In Data Analytics

0 votes
1 answer

How to use dplyr functions such as filter() inside nested data frames with map()

You can use map() call as follows:  map(full, ...READ MORE

answered Apr 6, 2018 in Data Analytics by Sahiti
• 6,370 points
4,268 views
0 votes
1 answer

How to replace NA values in a dataframe with Zero's ?

It is simple and easy: df1<-as.data.frame(matrix(sample(c(NA, 1:10), 100, ...READ MORE

answered Apr 10, 2018 in Data Analytics by CodingByHeart77
• 3,740 points
1,783 views
0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20, 2019 in Data Analytics by anonymous
• 33,030 points
14,405 views
0 votes
5 answers

How to remove NA values from a Vector in R?

Hello team, you can use na.omit x <- c(NA, 3, ...READ MORE

answered Dec 9, 2020 in Data Analytics by anonymous
• 82,880 points
191,789 views
0 votes
1 answer

How to filter a data frame with dplyr and tidy evaluation in R?

Requires the use of map_df to run each model, ...READ MORE

answered May 17, 2018 in Data Analytics by DataKing99
• 8,240 points
1,615 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

How can I use parallel so that it preserves the list of data frames

You can use pmap as follows: nc <- ...READ MORE

answered Apr 4, 2018 in Data Analytics by kappa3010
• 2,090 points
765 views
0 votes
1 answer
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 6, 2018 in Data Analytics by kappa3010
• 2,090 points
1,394 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP