Screening multi collinearity in a regression model

0 votes

I hope that this one is not going to be "ask-and-answer" question... here goes: (multi)collinearity refers to extremely high correlations between predictors in the regression model. How to cure them... well, sometimes you don't need to "cure" collinearity, since it doesn't affect regression model itself, but interpretation of an effect of individual predictors.

One way to spot collinearity is to put each predictor as a dependent variable, and other predictors as independent variables, determine R2, and if it's larger than .9 (or .95), we can consider predictor redundant. This is one "method"... what about other approaches? Some of them are time consuming, like excluding predictors from model and watching for b-coefficient changes - they should be noticeably different.

Of course, we must always bear in mind the specific context/goal of the analysis... Sometimes, only remedy is to repeat a research, but right now, I'm interested in various ways of screening redundant predictors when (multi)collinearity occurs in a regression model.

Mar 26 in Machine Learning by Nandini
• 5,480 points

1 answer to this question.

0 votes

The kappa() function can be of assistance. Here's an example of a modeled case. You can use help(kappa) for details.

> set.seed(42)
> y1 <- rnorm(100)
> y2 <- rnorm(100)
> y3 <- y1 + 2*y2 + rnorm(100)*0.0001    # so y3 approx a linear comb. of y1+y2
> mm12 <- model.matrix(~ y1 + y2)        # normal model, two indep. regressors
> mm123 <- model.matrix(~ y1 + y2 + y3)  # bad model with near collinearity
> kappa(mm12)                            # a 'low' kappa is good
[1] 1.166029
> kappa(mm123)                           # a 'high' kappa not good
[1] 121530.7

We go even further by increasing the collinearity of the third regressor:

> y4 <- y1 + 2*y2 + rnorm(100)*0.000001  # even more collinear
> mm124 <- model.matrix(~ y1 + y2 + y4)
> kappa(mm124)
[1] 13955982
> y5 <- y1 + 2*y2                        # now y5 is linear comb of y1,y2
> mm125 <- model.matrix(~ y1 + y2 + y5)
> kappa(mm125)
[1] 1.067568e+16

answered Mar 30 by Dev
• 6,000 points

Related Questions In Machine Learning

0 votes
1 answer

How do I create a linear regression model in Weka without training?

Weka is a classification algorithm. This is ...READ MORE

answered Mar 9 in Machine Learning by Nandini
• 5,480 points
0 votes
1 answer

How to load a model from an HDF5 file in Keras?

Hi@akhtar, If you stored the complete model, not ...READ MORE

answered Jul 14, 2020 in Machine Learning by MD
• 95,380 points
0 votes
1 answer
0 votes
1 answer

Plot logistic regression curve in R

The Code looks something like this: fit = ...READ MORE

answered Apr 4 in Machine Learning by Nandini
• 5,480 points
0 votes
1 answer

How to add regression line equation and R2 on graph?

Below is one solution: # GET EQUATION AND ...READ MORE

answered Jun 1, 2018 in Data Analytics by DataKing99
• 8,240 points
0 votes
1 answer

How to export regression equations for grouped data?

First, you'll need a linear model with ...READ MORE

answered Mar 14 in Machine Learning by Dev
• 6,000 points
0 votes
1 answer
0 votes
1 answer

How to use ICD10 Code in a regression model in R?

Using the concept of comorbidities is a ...READ MORE

answered Apr 12 in Machine Learning by Dev
• 6,000 points
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP