Building Random Forest on a data-set comprising of missing(NA) values

0 votes

I have a modified "iris" dataset comprising of missing values:

iris1$Sepal.Length[c(1,3,57,103)]<-NA

 and i want to build the "Random Forest" algorithm on top of it:

randomForest(Species~Sepal.Length,data=iris1)

But i get this error:

Error in na.fail.default(list(Species = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,  : missing values in object

Is there a way i can build the "random forest" algorithm on top of it?

Apr 2, 2018 in Data Analytics by nirvana
• 3,060 points

edited Apr 2, 2018 by nirvana 342 views

1 answer to this question.

0 votes

You have two options, either impute the missing values or omit the missing values.

If you want to impute the missing values in the predictor data, you can use rfImpute() function from randomForest package.

You can run the below command which will impute the missing values in the predictor data:

rfImpute(Species~.,data=iris1)->iris1

Now you can go ahead and use the randomForest function to build the "random Forest" algorithm on top of the iris1 dataset:

randomForest(Species~Sepal.Length,data=iris1)

If there are only few missing values in your data-set you can go ahead and remove them using na.omit() function:

na.omit(iris1)->iris1

After removing the missing values, you can go ahead and build the randomForest function on top of the "iris1" dataset:

randomForest(Species~Sepal.Length,data=iris1)
answered Apr 2, 2018 by Bharani
• 4,560 points

Related Questions In Data Analytics

0 votes
1 answer
0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20, 2019 in Data Analytics by anonymous
• 32,460 points
10,020 views
0 votes
2 answers

How to subset rows containing NA in a chosen column of a data frame?

You can give this a try. subset(dataframe, is.na(dataframe$col2)) ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 32,460 points
779 views
0 votes
1 answer

Extract a subset of a data frame based on a condition involving a field

Here are the two main approaches. I ...READ MORE

answered Jun 18, 2018 in Data Analytics by CodingByHeart77
• 3,710 points
5,451 views
0 votes
2 answers

Transforming a key/value string into distinct rows in R

We would start off by loading the ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,560 points
162 views
0 votes
1 answer

Finding frequency of observations in R

You can use the "dplyr" package to ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,560 points
1,219 views
0 votes
1 answer

Left Join and Right Join using "dplyr"

The below is the code to perform ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,560 points
208 views
0 votes
1 answer

Plotting multiple graphs on the same page in R

If you want to plot 4 graphs ...READ MORE

answered Mar 27, 2018 in Data Analytics by Bharani
• 4,560 points
133 views
+1 vote
2 answers

Finding number of missing values and removing those missing values from a data-frame

To find number of missing values for ...READ MORE

answered Aug 14, 2019 in Data Analytics by anonymous
121 views
+1 vote
2 answers

Custom Function to replace missing values in a vector with the mean of values

Try this. lapply(a,function(x){ifelse(is.na(x),mean(a,na.rm = TRUE ...READ MORE

answered Aug 14, 2019 in Data Analytics by anonymous
175 views