How to one hot encode several categorical variables in R

0 votes
I'm working on a prediction problem and am using R to design a decision tree. I have multiple categorical variables that I'd like to one-hot encode consistently in both my training and testing sets. I was able to achieve that with my training data by:

X train - temperatures

tt - subset(temps, select = -output) tt - subset(temps, select = -output) tt - subset(temp

data.frame(model.matrix(. -1, tt), CLASS = temps$output) oh -

But I can't seem to figure out how to apply the same encoding to my testing set; what can I do?
Jun 19, 2022 in Data Analytics by Avinash
• 1,260 points
1,634 views

1 answer to this question.

0 votes

43
I advise employing the caret package's dummyVars function:

Customers (id=c(10, 20, 30, 40, 50), gender=c("male," "female," "male," "female," mood=c("happy," "sad," "happy," "sad," "happy," outcome=c(1, 1, 0, 0, 0))
customers' gender, mood, and result
1-10 joyful males 1 2 20 females who are dejected 1 3 30 females who are joyful 0 4 40 males who are dejected 0 5 50 females who are joyful 0

# obscurate the data
Data = customers, dmy - dummyVars(" ."
data.frame(predict(dmy, newdata = customers), trsf)
gender of trsf.
gender is feminine.
male emotion
good mood
tragic result
1 10 0 1 1 0 1 2 20 1 0 0 1 3 30 1 0 1 0 0 4 40 0 1 0 1 0 5 50 1 0 1 0 0 Example source

You conduct both using the same process.

Transform data into actionable insights with our Data Analyst Certification – Enroll today!

answered Jun 24, 2022 by Sohail
• 3,040 points

Related Questions In Data Analytics