Which polynomial regression degree is significant ? depends of number of points or other parameters?

Question

I am studying the stability of numerical derivatives as a function of the step I take to compute these derivatives. With a derivative with 15 points (obtained by the finite difference method), I get the following plot (each multipole "l" corresponds to a parameter whose depends the derivative but it doesn't matter) :Now, I would like to compare this derivative of 15 points with the derivative computed with 3, 5 and 7 points. For this, I have just plotted the relative difference like (with absolute differences) :abs(f'_15_pts - f'_3_pts)/f'_3_pts for comparison between 15 and 3 points
abs(f'_15_pts - f'_5_pts)/f'_5_pts for comparison between 15 and 5 points
abs(f'_15_pts - f'_7_pts)/f'_7_pts for comparison between 15 and 7 points
And my issue occurs when I want to do a polynomial regression on the relative variations above with the multipole l=366.42 (the problem remains right for other multipoles).For example, when I do a cubic regression (3 degrees) , I get the following plot :I don't know exactly how to interpret these results: maybe it means that I have a relative error maximal between 3 points and 15 points derivatives and less between 5 and 15 like between 7 and 15 points.Then, If I want to do for example a polynomial regression of degree 10, I get the following plot :As you can see, this is totally different from cubic regression above.So I don't know which degree to take for polynomial regression, I mean which degree is relevant to get valid physical results : 3, 4, 6 or maybe 10. If I take a too large degree, results aren't valid since I have dirac peaks and straight lines.I guess the right polynomial degree to keep depends of the initial number of points of the interpolated curve (140 points for the first figure) and maybe others parameters.As conclusion, could anyone tell me if there is criteria to determine which polynomial degree to apply ?, I mean the degree which will be the most relevant from a relative error point of view.If I don't do regression, I have the following plot which is blurred to interpret :That's why I would like to interpolate these data, to see more clearly the differences between different relative evolutions.PS: here the code snippets of polynomial regression :stepForFit = np.logspace(-8.0,-1.0,10000)
coefs_3_15 = poly.polyfit(np.log10(stepNewArray), np.log10(errorRelative_3_15), 10)
ffit_3_15 = poly.polyval(np.log10(stepForFit), coefs_3_15)
coefs_5_15 = poly.polyfit(np.log10(stepNewArray), np.log10(errorRelative_5_15), 10)
ffit_5_15 = poly.polyval(np.log10(stepForFit), coefs_5_15)
coefs_7_15 = poly.polyfit(np.log10(stepNewArray), np.log10(errorRelative_7_15), 10)
ffit_7_15 = poly.polyval(np.log10(stepForFit), coefs_7_15)

# Plot interpolation curves
plt.plot(stepForFit[stepArrayId], np.power(10,ffit_3_15[stepArrayId]), colorDerPlot[0])
plt.plot(stepForFit[stepArrayId], np.power(10,ffit_5_15[stepArrayId]), colorDerPlot[1])
plt.plot(stepForFit[stepArrayId], np.power(10,ffit_7_15[stepArrayId]), colorDerPlot[2])v
UPDATE 1:&#160;Given I have not hypothesis (or a model) on the value of relative error, I can't put constraints&#160;apriori&#160;on the degree of polynome that must best-fit the data.But maybe I have an clue since the derivatives that I have computed are 3, 5, 7 and 15 points. So I have an uncertainty of level respectively of O(h^2), O(h^4), O(h^6) and O(h^14).For example, for the 3 points derivatives, I have :and so the final expression of derivatives :By the way, I don't understand why we pass from $O(h^4)$ to $O(h^2)$ between the expression.But main issue is that I have not for instant hypothesis on the polynomial degree that I have to apply.Maybe, I should test a range of polynomial degree and compute at each time the chi2, so minimal chi2 will give to me the right degree to take into account.What do you think about this ? Does Numpy or Python already have this kind of study with specific functions ?UPDATE 2:&#160;I tried to determine over a range of 1-15 degree of polynomial that could best fit the data. My criterion was to fit with a polynome for each degree and then compute the chi2 between "interpolation computed data" and "experimental data". If new chi2 is lower than previous chi2, I update the degree to choose to do a polynomial regression.Unfortunately, for each of the 3,5 and 7 points derivatives, I always get by this research of "ideal degree", the maximum degree which corresponds to the max of degree-interval explored.Ok, the chi2 is minimal for the highest degree but this doesn't correspond to physical results. One doesn't forget that below 10^-4, behavior of Cl' is chaotic, so I don't expect a physical interpretation on the convergence of the derivatives as an increasing number of points for derivatives.But the interesting area is above 10^-4 where I have more stability.Given my method of selecting best degree as a function of chi2 doesn't work (it always gives the maximal degree of the range explored), is there another method to get a nice fit? I know this is difficult since chaotic region for small steps.Last thing, the cubic regression (3 degrees) gives nice curves but I don't understand why this only occurs for degree 3 and not for higher degree.As someone said in the comments, for higher degree, regression is overfitted: how to fix this ?

Nandini · Answer

You're more likely to overfit the results as the polynomial degree increases. You can choose the degree of the polynomial by using regression and a normality test on the regression's residuals. Also, try scaling the data; the path of the graph appears to be approximated in the second interpolation graph, but the scale is different.
Because the user may know that their series behaves like a polynomial up to a particular degree but not the polynomial coefficients, the order of the polynomial in polynomial regression is normally a free parameter chosen by the user. You can set the degree yourself if you know something about your data. If you know your data is linearly associated with step, for example, you can set the degree to 1 and get linear regression.
You don't want to provide a higher degree in this situation because your data will almost certainly fit a polynomial, which you know isn't the case. In a similar vein, if your data behaves like a polynomial of some degree, you don't want to fit it any higher. You should make an educated guess if you have no idea what degree the polynomial should be. A reasonable technique is to simply plot the polynomial as it passes through the data points, increasing the degree of the polynomial one at a time. The line should then go in between the points, rather than diverging towards specific locations. If you have a lot of outliers, there are methods that are better than least-squares.

Unless you know further information about your data, there is no way to determine an ideal degree. A hyperparameter is something like a degree. If you want an optimal, you'll need to include more information, such as "I want the lowest degree polynomial that fits the data with an error epsilon."
Overfitting can be avoided by using a polynomial with a lower degree. If that doesn't work, least-squares regression isn't the method for you. You should investigate a regression method that uses a different metric, or you should preprocess the data, or you should employ a non-polynomial fit (fit a function of certain shape, then use Levenberg-Marquardt for instance).

Which polynomial regression degree is significant depends of number of points or other parameters

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Machine Learning

Is predicting number of sales a Regression or Classification problem?

Is Sophia Humanoid ‘Limited Memory’ type of AI or ‘Reactive Machines’ type of AI?

Is there a way to force the coefficient of the independent variable to be a positive coefficient in the linear regression model used in R?

Different types of Logistic Regression

Which polynomial regression degree is significant ? depends of number of points or other parameters?

how do i change string to a list?

how can i randomly select items from a list?

how can i count the items in a list?

Is Random Forest a linear or non linear regression model

What is the difference between Coefficient of Regression and Elasticity

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES