Building Random Forest on a data-set comprising of missing(NA) values

0 votes

I have a modified "iris" dataset comprising of missing values:

iris1$Sepal.Length[c(1,3,57,103)]<-NA

 and i want to build the "Random Forest" algorithm on top of it:

randomForest(Species~Sepal.Length,data=iris1)

But i get this error:

Error in na.fail.default(list(Species = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,  : missing values in object

Is there a way i can build the "random forest" algorithm on top of it?

Apr 2, 2018 in Data Analytics by nirvana
• 3,060 points

edited Apr 2, 2018 by nirvana 173 views

1 answer to this question.

0 votes

You have two options, either impute the missing values or omit the missing values.

If you want to impute the missing values in the predictor data, you can use rfImpute() function from randomForest package.

You can run the below command which will impute the missing values in the predictor data:

rfImpute(Species~.,data=iris1)->iris1

Now you can go ahead and use the randomForest function to build the "random Forest" algorithm on top of the iris1 dataset:

randomForest(Species~Sepal.Length,data=iris1)

If there are only few missing values in your data-set you can go ahead and remove them using na.omit() function:

na.omit(iris1)->iris1

After removing the missing values, you can go ahead and build the randomForest function on top of the "iris1" dataset:

randomForest(Species~Sepal.Length,data=iris1)
answered Apr 2, 2018 by Bharani
• 4,550 points

Related Questions In Data Analytics

0 votes
1 answer
0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20 in Data Analytics by anonymous
• 31,840 points
5,678 views
0 votes
2 answers

How to subset rows containing NA in a chosen column of a data frame?

You can give this a try. subset(dataframe, is.na(dataframe$col2)) ...READ MORE

answered Aug 21 in Data Analytics by anonymous
• 31,840 points
126 views
0 votes
1 answer

Extract a subset of a data frame based on a condition involving a field

Here are the two main approaches. I ...READ MORE

answered Jun 18, 2018 in Data Analytics by CodingByHeart77
• 3,690 points
1,174 views
0 votes
2 answers

Transforming a key/value string into distinct rows in R

We would start off by loading the ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,550 points
79 views
0 votes
1 answer

Finding frequency of observations in R

You can use the "dplyr" package to ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,550 points
200 views
0 votes
1 answer

Left Join and Right Join using "dplyr"

The below is the code to perform ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,550 points
123 views
0 votes
1 answer

Plotting multiple graphs on the same page in R

If you want to plot 4 graphs ...READ MORE

answered Mar 27, 2018 in Data Analytics by Bharani
• 4,550 points
62 views
+1 vote
2 answers
+1 vote
2 answers

Custom Function to replace missing values in a vector with the mean of values

Try this. lapply(a,function(x){ifelse(is.na(x),mean(a,na.rm = TRUE) ...READ MORE

answered Aug 14 in Data Analytics by anonymous
101 views