How to drop factor levels in a subsetted data frame

0 votes

Consider a data frame containing a factor.

When I create a subset using subset() or any other indexing function, then a new data frame is created.

I have observed that the factor variable retains all of its original levels, even if they do not exist in the new data frame.

This creates problems while plotting or using the functions that rely on factor levels.

Is there any way to remove levels from a factor in the new data frame i.e. the data frame I have taken a subset of

Below is my example:

data <- data.frame(letters=letters[1:10],
                    numbers=seq(1:10))

levels(data$letters)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

subdata <- subset(data, numbers <= 5)
##   letters numbers
## 1       a       1
## 2       b       2
## 3       c       3
## 4       d       4
## 5       e       5 
## But the  levels are still there!
levels(subdata$letters)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
Apr 17, 2018 in Data Analytics by Sahiti
• 6,370 points
5,438 views

1 answer to this question.

0 votes

You can use factor(ff) to drop levels that do not occur

factor(ff)      
# drops the levels that do not occur

For dropping levels from all factor columns in a data frame, you can use:

subdata <- subset(data, numbers <= 5)

##   letters numbers
## 1       a       1
## 2       b       2
## 3       c       3 
## 4       d       4
## 5       e       5 
subdata[] <- lapply(subdata, function(x) if(is.factor(x)) factor(x) else x)

NOTE: You can use the above command in a single dataframe with minimal columns. But, if you have large number of columns then you have to use drop.levels() from gdata.

answered Apr 17, 2018 by kappa3010
• 2,090 points

edited Apr 17, 2018 by kappa3010

Related Questions In Data Analytics

+1 vote
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,030 points
1,837 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,130 points

edited Apr 12, 2018 by nirvana 22,134 views
0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20, 2019 in Data Analytics by anonymous
• 33,030 points
14,788 views
+1 vote
2 answers

How can I drop columns by name in a data frame ?

We can Drop Columns by name in ...READ MORE

answered Apr 14, 2018 in Data Analytics by zombie
• 3,790 points
28,464 views
0 votes
1 answer

Drop unused levels from a data frame in R

You can use this command droplevels() y <- ...READ MORE

answered Jun 14, 2018 in Data Analytics by DataKing99
• 8,250 points
1,989 views
+1 vote
1 answer

How to convert a list of vectors with various length into a Data.Frame?

We can easily use this command as.data.frame(lapply(d1, "length< ...READ MORE

answered Apr 4, 2018 in Data Analytics by DeepCoder786
• 1,720 points
1,704 views
0 votes
2 answers

In data frame how to spilt strings into values?

You can do this using dplyr and ...READ MORE

answered Dec 5, 2018 in Data Analytics by Kalgi
• 52,350 points
1,132 views
0 votes
1 answer
0 votes
1 answer
0 votes
2 answers

How to subset rows containing NA in a chosen column of a data frame?

You can give this a try. subset(dataframe, is.na(dataframe$col2)) ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,030 points
10,507 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP