How to create dummy variables based on a categorical variable of lists in R?

0 votes

There is a data frame with a categorical variable holding listss of strings having various lengths. Consider the below example:

data <- data.frame(x = 1:5)
data$y <- list("A", c("A", "B"), "C", c("B", "D", "C"), "E")
data
  x       y
1 1       A
2 2    A, B
3 3       C
4 4 B, D, C
5 5       E

The required form is a dummy variable for each unique string being seen anywhere in data$y, i.e.:

data.frame(x = 1:5, A = c(1,1,0,0,0), B = c(0,1,0,1,0), C = c(0,0,1,1,0), D = c(0,0,0,1,0), E = c(0,0,0,0,1))
  x A B C D E
1 1 1 0 0 0 0
2 2 1 1 0 0 0
3 3 0 0 1 0 0
4 4 0 1 1 1 0
5 5 0 0 0 0 1

The approach I have chosen is very slow on big data frames. Below is my approach

unique_Strings <- unique(unlist(data$y))
n <- ncol(data)
for (i in 1:length(unique_Strings)) {
+   data[,  n + i] <- sapply(data$y, function(x) ifelse(unique_Strings[i] %in% x, 1, 0))
+   colnames(data)[n + i] <- unique_Strings[i]
+ }

Any suggestions so that I can improve on my code!

Apr 13, 2018 in Data Analytics by darklord
• 6,140 points
500 views

1 answer to this question.

0 votes

You can use mtabulate in the following way:

library(qdapTools)
cbind(data[1], mtabulate(data$y))
#  x A B C D E
#1 1 1 0 0 0 0
#2 2 1 1 0 0 0
#3 3 0 0 1 0 0
#4 4 0 1 1 1 0
#5 5 0 0 0 0 1
answered Apr 13, 2018 by CodingByHeart77
• 3,680 points

Related Questions In Data Analytics

0 votes
2 answers

How to arrange a data set in ascending order based on a variable?

In your case it'll be, orderedviews = arrange(movie_views, ...READ MORE

answered Nov 27, 2018 in Data Analytics by Kalgi
• 40,460 points
39 views
0 votes
2 answers
0 votes
1 answer
0 votes
1 answer

How to create a date variable in R?

Create a string with date notation as ...READ MORE

answered Jul 16 in Data Analytics by anonymous
15 views
0 votes
1 answer

How to convert a list of vectors with various length into a Data.Frame?

We can easily use this command as.data.frame(lapply(d1, "length< ...READ MORE

answered Apr 4, 2018 in Data Analytics by DeepCoder786
• 1,700 points
79 views
0 votes
1 answer

How to create a list of Data frames?

Basically all we have to do is ...READ MORE

answered Apr 9, 2018 in Data Analytics by DeepCoder786
• 1,700 points
69 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,060 points

edited Apr 12, 2018 by nirvana 2,539 views
0 votes
1 answer
0 votes
2 answers

How to sum a variable by group in R?

You can also try this way, x_new = ...READ MORE

answered Jul 31 in Data Analytics by Cherukuri
• 25,900 points
7,526 views
0 votes
1 answer

How to write a custom function which will replace all the missing values in a vector with the mean of values in R?

Consider this vector: a<-c(1,2,3,NA,4,5,NA,NA) Write the function to impute ...READ MORE

answered Jul 4, 2018 in Data Analytics by CodingByHeart77
• 3,680 points
139 views