Business Analytics Decision Tree in R
Understanding R Programming Language
‘Business Analytics with ‘R’ at Edureka will prepare you to perform analytics and build models for real world data science problems. It is the world’s most powerful programming language for statistical computing and graphics making it a must know language for the aspiring Data Scientists. ‘R’ wins strongly on Statistical Capability, Graphical capability, Cost and rich set of packages.This video makes you understand the way to make business analytics decision tree in R.
Decision Tree Data & Code
# Fitting classification Tree models # Load company data file from folder file_path < - "/home/abhay/Downloads/decision_tree/Company_Data.csv" Company_Data <- read.csv(file_path) # Attach company dataset in R-Session attach(Company_Data) names(Company_Data) # To find out the range of Sales variable range(Sales) fivenum(Sales) # create a catagorical variable based on this sales, If sales is higher than 8 give it Yes # Else give in No Sales_Cat_Var <- ifelse(Sales >=8, "High", "Low") # Check how many rows are there in "Sales_Cat_Var" variable length(Sales_Cat_Var) # To check the dimension of Company_Data dim (Company_Data) # To check the first few row of Company_Data head(Company_Data) # append Sales_Cat_Var to Company_Data dataset, Company_Data < - data.frame(Company_Data, Sales_Cat_Var) # Remove sales col from dataset Company_Data <- Company_Data[,-1] # Check column names names(Company_Data) # Files to be used in Rattle write.csv(Company_Data, file = "Company_Data.csv") # Above file can be found at below location getwd() # Split data into trainig & testing datasets set.seed(2) train <- sample(1:nrow(Company_Data),nrow(Company_Data)/2) length(train) test <- -train training_data <- Company_Data[train,] head(training_data) testing_data <- Company_Data[test,]# plot tree head(testing_data) testing_Sales_Cat_Var <- Sales_Cat_Var[test] # fit the tree model using training data library(tree) Mod_Decision_Tree <- tree(Sales_Cat_Var ~ ., training_data) plot(Mod_Decision_Tree) text(Mod_Decision_Tree, pretty = 0, ) # Pretty is used for catagorical variables to get the real values testing_Sales_Cat_Var <- Sales_Cat_Var[test] # fit the tree model using training data Mod_Decision_Tree <- tree(Sales_Cat_Var ~., training_data) # plot tree plot(Mod_Decision_Tree) text(Mod_Decision_Tree, pretty = 0, ) # Pretty is used for catagorical variables to get the real values # Check how the model is doing on testing data # type is used to specify classification Decision_tree_prediction <- predict(Mod_Decision_Tree, testing_data, type = "class") # To get the missclassification error mean( Decision_tree_prediction != testing_Sales_Cat_Var) # Since this error is high we need to do pruneing of tree # Prune the tree to decide how many level up we need to go # cross validation to check where to stop pruning # To fix the sample set.seed(3) # cv.tree function is cross validation function Cross_Val_Tree <- cv.tree(Mod_Decision_Tree, FUN = prune.misclass) names(Cross_Val_Tree) # Plot Tree size against error # dev is nothing but the erroe rate plot(Cross_Val_Tree$size, Cross_Val_Tree$dev, type = "b") # Prune the tree based on the error from the above command Decision_Tree_P_Model <- prune.misclass(Mod_Decision_Tree,best = 9) plot(Decision_Tree_P_Model) text(Decision_Tree_P_Model, pretty = 0) ## Again check how the tree is doing Decision_tree_prediction <- predict(Decision_Tree_P_Model, testing_data, type = "class") mean(Decision_tree_prediction != testing_Sales_Cat_Var)
Got a question for us? Please mention it in the comments section and we will get back to you.