Run an OLS regression with Pandas Data Frame

Question

I have a&#160;pandas&#160;data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example:import pandas as pd
df = pd.DataFrame({"A": [10,20,30,40,50], 
                   "B": [20, 30, 10, 40, 50], 
                   "C": [32, 234, 23, 23, 42523]})
Ideally, I would have something like&#160;ols(A ~ B + C, data = df)&#160;but when I look at the&#160;examples&#160;from algorithm libraries like&#160;scikit-learn&#160;it appears to feed the data to the model with a list of rows instead of columns. This would require me to reformat the data into lists inside lists, which seems to defeat the purpose of using pandas in the first place. What is the most pythonic way to run an OLS regression (or any machine learning algorithm more generally) on data in a pandas data frame?

Nandini · Answer

I believe you may almost achieve your desired result by utilising the statsmodels package, which was one of pandas' optional dependencies prior to version 0.20.0. (it was used for a few things in pandas.stats.)import pandas as pd
import statsmodels.formula.api as sm
data = pd.DataFrame({"A": [10,20,30,40,50], "B": [20, 30, 10, 40, 50], "C": [32, 234, 23, 23, 42523]})
result = sm.ols(formula="A ~ B + C", data).fit()
print(result.params)
Intercept    14.952480
B             0.401182
C             0.000352
dtype: float64 print(result.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      A   R-squared:                       0.579
Model:                            OLS   Adj. R-squared:                  0.158
Method:                 Least Squares   F-statistic:                     1.375
Date:                Thu, 14 Nov 2013   Prob (F-statistic):              0.421
Time:                        20:04:30   Log-Likelihood:                -18.178
No. Observations:                   5   AIC:                             42.36
Df Residuals:                       2   BIC:                             41.19
Df Model:                           2                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     14.9525     17.764      0.842      0.489       -61.481    91.386
B              0.4012      0.650      0.617      0.600        -2.394     3.197
C              0.0004      0.001      0.650      0.583        -0.002     0.003
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   1.061
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.498
Skew:                          -0.123   Prob(JB):                        0.780
Kurtosis:                       1.474   Cond. No.                     5.21e+04
==============================================================================

Warnings:
[1] The condition number is large, 5.21e+04. This might indicate that there are
strong multicollinearity or other numerical problems.Supercharge Your Skills with Our&#160;Machine Learning Course!

Run an OLS regression with Pandas Data Frame

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Machine Learning

Can I draw a regression line and show parameters using scatterplot with a pandas dataframe?

Logistic Regression with continuous data using sklearn in python

Example of Logistic regression with python code

Create dataframe using Pandas - Linear Regression

how do i change string to a list?

how can i randomly select items from a list?

how can i count the items in a list?

how do i use the enumerate function inside a list?

Difference between classification and regression, with SVMs

Linear regression with gradient descent to predict house prices using area (one var) in python

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES