RF divides your information into square boxes. The new data point then follows the yes/no responses and ends up in a box. The majority of the classes is the prediction in classification, which counts how many of each class are in each box. When performing regression, the mean of the values in each box is used.

The following equation is used in a regression situation.

y = b0 + x1*b1 + x2*b2 +.. + xn*bn

where xi denotes your characteristic The coefficient to xi is I and "bi." The coefficients of a linear regression are linear, however suppose we have the following regression.

y=x0 +x1*b1 + x2*cos(b2)

Because the coefficient b2 is not linear, this is not a linear regression. To see if it's linear, the derivative of y with respect to bi should be independent of bi for all bi. For example, consider the first (linear) case:

dy/db1 = x1

The first example is independent of b1 (it returns the same result for all dy/dbi), but the second example is not.

# y=x0 +x1*b1 + x2*cos(b2)
dy/db2 = x2*(-sin(b2))

This is not a linear regression because it is not independent of b2.

As you can see, RF and linear regression are two distinct concepts, and a regression's linearity has nothing to do with RF (or the other way round that matter).