How to drop factor levels in a subsetted data frame?

0 votes

Consider a data frame containing a factor.

When I create a subset using subset() or any other indexing function, then a new data frame is created.

I have observed that the factor variable retains all of its original levels, even if they do not exist in the new data frame.

This creates problems while plotting or using the functions that rely on factor levels.

Is there any way to remove levels from a factor in the new data frame i.e. the data frame I have taken a subset of

Below is my example:

data <- data.frame(letters=letters[1:10],
                    numbers=seq(1:10))

levels(data$letters)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

subdata <- subset(data, numbers <= 5)
##   letters numbers
## 1       a       1
## 2       b       2
## 3       c       3
## 4       d       4
## 5       e       5 
## But the  levels are still there!
levels(subdata$letters)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
Apr 17, 2018 in Data Analytics by darklord
• 6,140 points
582 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You can use factor(ff) to drop levels that do not occur

factor(ff)      
# drops the levels that do not occur

For dropping levels from all factor columns in a data frame, you can use:

subdata <- subset(data, numbers <= 5)

##   letters numbers
## 1       a       1
## 2       b       2
## 3       c       3 
## 4       d       4
## 5       e       5 
subdata[] <- lapply(subdata, function(x) if(is.factor(x)) factor(x) else x)

NOTE: You can use the above command in a single dataframe with minimal columns. But, if you have large number of columns then you have to use drop.levels() from gdata.

answered Apr 17, 2018 by kappa3010
• 2,010 points

edited Apr 17, 2018 by kappa3010

Related Questions In Data Analytics

0 votes
1 answer

How to sort a data frame by columns in R?

You can just use the order function ...READ MORE

answered Apr 10, 2018 in Data Analytics by darklord
• 6,140 points
53 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,040 points

edited Apr 12, 2018 by nirvana 1,314 views
0 votes
1 answer

How to remove rows with missing values (NAs) in a data frame?

You can use complete.cases in the following ...READ MORE

answered Apr 13, 2018 in Data Analytics by darklord
• 6,140 points
2,982 views
0 votes
1 answer

How can I drop columns by name in a data frame ?

We can Drop Columns by name in ...READ MORE

answered Apr 13, 2018 in Data Analytics by zombie
• 3,690 points
18 views
0 votes
1 answer

Drop unused levels from a data frame in R

You can use this command droplevels() y <- ...READ MORE

answered Jun 14, 2018 in Data Analytics by DataKing99
• 8,100 points
151 views
0 votes
1 answer

How to convert a list of vectors with various length into a Data.Frame?

We can easily use this command as.data.frame(lapply(d1, "length< ...READ MORE

answered Apr 4, 2018 in Data Analytics by DeepCoder786
• 1,700 points
32 views
0 votes
2 answers

In data frame how to spilt strings into values?

You can do this using dplyr and ...READ MORE

answered Dec 4, 2018 in Data Analytics by Kalgi
• 35,800 points
23 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

How to subset rows containing NA in a chosen column of a data frame?

I would suggest you, to never to ...READ MORE

answered Apr 26, 2018 in Data Analytics by kappa3010
• 2,010 points
59 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.