Negative coefficients in regression for positive coefficient of correlation

Question

I am building a multiple linear regression model using python. I found the correlation coefficients between all the independent variables and the dependent variable. They were all greater than 0.5. However, the equation given by the regression model has a few negative coefficients. Why is that?

Nandini · Answer 1 · Mar 25, 2022

The situation you describe is not out of the bounds of possibility. The idea is to look at how your independent variables are related. It's possible that one of them has a negative coefficient in your linear regression if there's a high correlation between two of them.

Consider the following scenario, in which you want to forecast y given independent variables x1 and x2:
Assume y is deterministic and follows the formula y = x1 + 2 * x2.
Assume that x2 is predictable as well, and that x2 = 0.1 * x1

Then you could say y = 1.2 * x1 + 0 * x2 as well as y = 0 * x1 + 12 * x2 or y = 2 * x1 - 8 * x2 because your linear regression has an endless number of possibilities. Although there is a positive correlation between y and x2, your linear regression has a negative coefficient in this example. There isn't a single mistake or error.
That is why you should not assume that just because your linear regression model has a positive slope, there is a positive correlation between the independent and predicted variables. And, of course, you can't draw any conclusions about causation based on your findings.

Supercharge Your Skills with Our Machine Learning Course!