Is there a way to force the coefficient of the independent variable to be a positive coefficient in the linear regression model used in R

Question

In lm(y ~ x1 + x2+ x3 +...+ xn) , not all independent variables are positive. For example, we know that x1 to x5 must have positive coefficients and x6 to x10 must have negative coefficients. However, when lm(y ~ x1 + x2+ x3 +...+ x10) is performed using R, some of x1 ~ x5 have negative coefficients and some of x6 ~ x10 have positive coefficients. is the data analysis result. I want to control this using a linear regression method, is there any good way?

Dev · Answer 1 · Mar 7, 2022

A Few Constraints

This is an example of Simpson's Paradox, which illustrates situations in which the sign of a correlation might change depending on whether or not another variable is included.

In the case of nls with algorithm = "port," upper and lower constraints can be defined.

If all coefficients should be non-negative, use nnnpls in the nnls package, which supports upper and lower 0 bounds, or nnls in the same package.

In the bvls package, type bvls (bounded value least squares) and specify the bounds.
In the CVXR package's vignette, there is an example of executing non-negative least squares.

Use the quadprog package to reformulate it as a quadratic programming problem (see Wikipedia for the formulation).

The limSolve package contains nnls. To make it a non-negative least squares issue, delete the columns that should have negative coefficients.

The majority of these packages don't offer a formula interface and instead require a model matrix and dependent variable to be given as separate arguments. The model matrix can be calculated if df is a data frame containing the data and the first column is the dependent variable:

B <- model.matrix(~., df[-1])

and the dependent variable is

df[[1]]

Certain Penalties

Another option is to apply a penalty to the least squares objective function, such that it becomes the sum of the squares of the residuals plus one or more additional terms that are functions of the coefficients and tuning parameters. Despite the fact that this does not apply any strict limitations to ensure the appropriate signs, it may nevertheless result in the proper signs. This is especially helpful when the problem is poorly conditioned or there are more predictors than observations.

The ridge package's linearRidge function minimizes the sum of the squares of the residuals plus a penalty equal to lambda times the sum of squares of the coefficients. Lambda is a scalar tuning parameter that the software can determine automatically. When the lambda is zero, it reduces to least squares. The software has a formula technique, which, along with the automatic tuning, makes it quite simple to use.

glmnet introduces penalty terms with two tuning options. As special instances, least squares and ridge regression are included. It also allows for coefficient bounds. The two tuning parameters can be automatically set, although there is no formula technique and the operation is not as simple as in the ridge package. More information can be found in the vignettes that come with it.