When scoring a logistic regression model , is having the predicted variable in test dataset mandatory ?

0 votes
Please help in explaining the scoring process of a model which was built on training data set and now I want to apply it to test data set to get the final results !
Oct 15, 2018 in Data Analytics by Edureka
• 120 points
59 views
Follow up question to this -->I have created the model and confusion matrix and now i want to implement this model to a different new TEST data set. What should be my approach ? and how can i predict yes/no for each record of the new dataset ?

2 answers to this question.

0 votes

We'll need a target variable to predict the output with the actual values if there is no output column you can just create one

In case of R use the code: test_set$Output_Column_Name<- NA
In case of Python use the code: test_set['Output_Column_Name'] = np.nan

Up next after creating of the target column, you'll have to predict the output on it using the logistic regression model created.

Once you have the target column values which would be binary in nature i.e. either YES/NO or 1 / 0, we'll compare it with the actual values based upon a confusion matrix.

Confusion matrix takes into account the below mentioned values

  • True Positive: number of instances in which the actual and the predicted value both are True
  • True Negative: number of instances in which the actual and the predicted value both are False
  • False Positive: number of instances in which the actual is False but the predicted value is True
  • False Negative: number of instances in which the actual is True but the predicted value is False
Based on the actual number of above values the accuracy of the model is created as per the formula
(True Positive +True Negatve) / Total number of Instances

For more details about the confusion matrix you can refer to the following link:https://bit.ly/2RSDtW8

Hope this helps :)

answered Oct 16, 2018 by Anmol
• 1,610 points
0 votes

Answer to your follow up question:

We can never find the accuracy of a model without the actual values if you have created a predictive mode and arel testing it on the new data whose target variable values are unknown, then you are actually deploying the model to predict the outcome based on new data.
Coming to the second part of your question - "how can I predict yes/no for each record of the new dataset ?"
Once the model is created you just have to again create a dummy column with NA values in the new dataset and then use the below code
predict <- predict(model_name,newdata = new_test_data, type = 'response')
The predict dataframe which would be the output of above command would have the respective values corresponding to each instance/row which you passed. 
The new_test_data can have just multiple numbers of rows, even just a single row would work.
answered Oct 17, 2018 by Anmol
• 1,610 points

Related Questions In Data Analytics

0 votes
1 answer
0 votes
1 answer

Check if the object(variable) is defined in R

You can use the exists(): > exists("toFindUnknown") [1] FALSE > ...READ MORE

answered Apr 17, 2018 in Data Analytics by darklord
• 6,170 points
53 views
0 votes
1 answer

What is the importance of having a selection bias?

Selection biased is used when there is ...READ MORE

answered Aug 23, 2018 in Data Analytics by Anmol
• 3,620 points
57 views
0 votes
1 answer

Define a SQL query? What is the difference between SELECT and UPDATE Query? How do you use SQL in SAS?

Structured query language (SQL) is a programming ...READ MORE

answered Aug 24, 2018 in Data Analytics by Anmol
• 3,620 points
137 views
0 votes
2 answers

What are the rules to define a variable name in R programming language?

The same rules almost follow for all ...READ MORE

answered Aug 26 in Data Analytics by anonymous
• 28,040 points
98 views
0 votes
1 answer

how to run a logistic regression with clustered standard errors in R?

have a look at rms package. lrm is logistic ...READ MORE

answered Nov 6, 2018 in Data Analytics by Kalgi
• 41,660 points
159 views
0 votes
1 answer

Error saying " cannot open the connection" when trying to install a package in R

Try install.packages(“package_name”, repos="http://cran.us.r-p ...READ MORE

answered Nov 26, 2018 in Data Analytics by Maverick
• 10,040 points
348 views
+5 votes
0 answers
0 votes
1 answer

How to pass command line arguments to run a Rscript

1. For taking an argument from the ...READ MORE

answered Aug 6, 2018 in Data Analytics by Anmol
• 1,610 points
189 views
+1 vote
2 answers

avoid producing a eps-graphic with 2 pages by mixing traditional and grid graphics

The bv_set() is missing from your code hence ...READ MORE

answered Aug 27, 2018 in Data Analytics by Anmol
• 1,610 points
49 views