How to drop factor levels in a subsetted data frame?

0 votes

Consider a data frame containing a factor.

When I create a subset using subset() or any other indexing function, then a new data frame is created.

I have observed that the factor variable retains all of its original levels, even if they do not exist in the new data frame.

This creates problems while plotting or using the functions that rely on factor levels.

Is there any way to remove levels from a factor in the new data frame i.e. the data frame I have taken a subset of

Below is my example:

data <- data.frame(letters=letters[1:10],
                    numbers=seq(1:10))

levels(data$letters)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

subdata <- subset(data, numbers <= 5)
##   letters numbers
## 1       a       1
## 2       b       2
## 3       c       3
## 4       d       4
## 5       e       5 
## But the  levels are still there!
levels(subdata$letters)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
Apr 17, 2018 in Data Analytics by darklord
• 6,140 points
799 views

1 answer to this question.

0 votes

You can use factor(ff) to drop levels that do not occur

factor(ff)      
# drops the levels that do not occur

For dropping levels from all factor columns in a data frame, you can use:

subdata <- subset(data, numbers <= 5)

##   letters numbers
## 1       a       1
## 2       b       2
## 3       c       3 
## 4       d       4
## 5       e       5 
subdata[] <- lapply(subdata, function(x) if(is.factor(x)) factor(x) else x)

NOTE: You can use the above command in a single dataframe with minimal columns. But, if you have large number of columns then you have to use drop.levels() from gdata.

answered Apr 17, 2018 by kappa3010
• 2,020 points

edited Apr 17, 2018 by kappa3010

Related Questions In Data Analytics

0 votes
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21 in Data Analytics by anonymous
• 25,580 points
118 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,060 points

edited Apr 12, 2018 by nirvana 2,464 views
0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20 in Data Analytics by anonymous
• 25,580 points
4,405 views
0 votes
1 answer

How can I drop columns by name in a data frame ?

We can Drop Columns by name in ...READ MORE

answered Apr 13, 2018 in Data Analytics by zombie
• 3,690 points
36 views
0 votes
1 answer

Drop unused levels from a data frame in R

You can use this command droplevels() y <- ...READ MORE

answered Jun 14, 2018 in Data Analytics by DataKing99
• 8,130 points
230 views
0 votes
1 answer

How to convert a list of vectors with various length into a Data.Frame?

We can easily use this command as.data.frame(lapply(d1, "length< ...READ MORE

answered Apr 4, 2018 in Data Analytics by DeepCoder786
• 1,700 points
77 views
0 votes
2 answers

In data frame how to spilt strings into values?

You can do this using dplyr and ...READ MORE

answered Dec 4, 2018 in Data Analytics by Kalgi
• 40,420 points
48 views
0 votes
1 answer
0 votes
1 answer
0 votes
2 answers

How to subset rows containing NA in a chosen column of a data frame?

You can give this a try. subset(dataframe, is.na(dataframe$col2)) ...READ MORE

answered Aug 21 in Data Analytics by anonymous
• 25,580 points
86 views