10 Jun 2015

Business Analytics Decision Tree in R

Understanding R Programming Language ‘Business Analytics with ‘R’ at Edureka will prepare you to perform analytics and build models for real world data science problems. It is the world’s most powerful programming language for statistical computing and graphics making it a must know language for the aspiring Data Scientists. ‘R’ wins strongly on Statistical Capability,...
Read More

Understanding R Programming Language

‘Business Analytics with ‘R’ at Edureka will prepare you to perform analytics and build models for real world data science problems. It is the world’s most powerful programming language for statistical computing and graphics making it a must know language for the aspiring Data Scientists. ‘R’ wins strongly on Statistical Capability, Graphical capability, Cost and rich set of packages.This video makes you understand the way to make business analytics decision tree in R.

Decision Tree Data & Code

# Fitting classification Tree models
# Load company data file from folder
file_path < - "/home/abhay/Downloads/decision_tree/Company_Data.csv"
Company_Data <- read.csv(file_path)
# Attach company dataset in R-Session
attach(Company_Data)
names(Company_Data)
# To find out the range of Sales variable
range(Sales)
fivenum(Sales)
# create a catagorical variable based on this sales, If sales is higher than 8 give it Yes
# Else give in No
Sales_Cat_Var <- ifelse(Sales >=8, "High", "Low")
# Check how many rows are there in "Sales_Cat_Var" variable
length(Sales_Cat_Var)
# To check the dimension of Company_Data
dim (Company_Data)
# To check the first few row of Company_Data
head(Company_Data)
# append Sales_Cat_Var to Company_Data dataset,
Company_Data < - data.frame(Company_Data, Sales_Cat_Var)
# Remove sales col from dataset
Company_Data <- Company_Data[,-1]
# Check column names
names(Company_Data)
# Files to be used in Rattle
write.csv(Company_Data, file = "Company_Data.csv")
# Above file can be found at below location
getwd()
# Split data into trainig & testing datasets
set.seed(2)
train <- sample(1:nrow(Company_Data),nrow(Company_Data)/2)
length(train)
test <- -train
training_data <- Company_Data[train,]
head(training_data)
testing_data <- Company_Data[test,]# plot tree
head(testing_data)
testing_Sales_Cat_Var <- Sales_Cat_Var[test]
# fit the tree model using training data
library(tree)
Mod_Decision_Tree <- tree(Sales_Cat_Var ~ ., training_data)
plot(Mod_Decision_Tree)
text(Mod_Decision_Tree, pretty = 0, ) # Pretty is used for catagorical variables to get the real values
testing_Sales_Cat_Var <- Sales_Cat_Var[test]
# fit the tree model using training data
Mod_Decision_Tree <- tree(Sales_Cat_Var ~., training_data)
# plot tree
plot(Mod_Decision_Tree)
text(Mod_Decision_Tree, pretty = 0, ) # Pretty is used for catagorical variables to get the  real values
# Check how the model is doing on testing data
# type is used to specify classification
Decision_tree_prediction <- predict(Mod_Decision_Tree, testing_data, type = "class")
# To get the missclassification error
mean( Decision_tree_prediction != testing_Sales_Cat_Var)
# Since this error is high we need to do pruneing of tree
# Prune the tree to decide how many level up we need to go
# cross validation to check where to stop pruning
# To fix the sample
set.seed(3)
# cv.tree function is cross validation function
Cross_Val_Tree <- cv.tree(Mod_Decision_Tree, FUN = prune.misclass)
names(Cross_Val_Tree)
# Plot Tree size against error
# dev is nothing but the erroe rate
plot(Cross_Val_Tree$size, Cross_Val_Tree$dev, type = "b")
# Prune the tree based on the error from the above command
Decision_Tree_P_Model <- prune.misclass(Mod_Decision_Tree,best = 9)
plot(Decision_Tree_P_Model)
text(Decision_Tree_P_Model, pretty = 0)
## Again check how the tree is doing
Decision_tree_prediction <- predict(Decision_Tree_P_Model, testing_data, type = "class")
mean(Decision_tree_prediction != testing_Sales_Cat_Var)

Got a question for us? Please mention it in the comments section and we will get back to you.

Related Posts:

Continue Watching

Watch It Again

Comments
4 Comments
  • Al

    Could we get the Decision Tree Data & Code to recreate output please?

    • EdurekaSupport

      Hi Al, we have updated the post with the code.

      • Sudheer

        Where can I download the company data from ?

        • EdurekaSupport

          Hi Sudheer,
          Thank you for reaching out to us.
          We would recommend that you get in touch with us for further clarification by contacting our sales team on +91-8880862004 (India) or 1800 275 9730 (US toll free). You can mail us on sales@edureka.co.

24 X 7 Customer Support X

  • us flag 1-800-275-9730 (Toll Free)
  • india flag +91 88808 62004