Logistic regression coefficient meaning

0 votes

I'm trying to write my own logistic regressor (using batch/mini-batch gradient descent) for practice purposes.

I generated a random dataset (see below) with normally distributed inputs, and the output is binary (0,1). I manually used coefficients for the input and was hoping to be able to reproduce them (see below for the code snippet). However, to my surprise, neither my own code, nor sklearn LogisticRegression were able to reproduce the actual numbers (although the sign and order of magnitude are in line). Moreso, the coefficients my algorithm produced are different than the one produced by sklearn. Am I misinterpreting what the coefficients for a logistic regression are?

I will appreciate any insight into this discrepancy.

Thank you!

edit: I tried using statsmodels Logit and got yet a third set of slightly different values for the coefficients

Some more info that might be relevant: I wrote a linear regressor using an almost identical code and it worked perfectly, so I am fairly confident this is not a problem in the code. Also my regressor actually outperformed the sklearn one on the training set, and they have the exact same accuracy on the test set, so I have no reason to believe the regressors are wrong.

Code snippets for the generation of the dataset:

o1 = 2
o2 = -3
x[:,1]=np.random.rand(size)*2
x[:,2]=np.random.rand(size)*3
y = np.vectorize(sigmoid)(x[:,1]*o1+x[:,2]*o2 + np.random.normal(size=size))

so as can be seen, input coefficients are +2 and -3 (intercept 0); sklearn coefficients were ~2.8 and ~-4.8; my coefficients were ~1.7 and ~-2.6

and of the regressor (the most relevant parts of it):

for j in range(bin_size):
    xs = x[i]
    y_real = y[i]
    z = np.dot(self.coeff,xs)
    h = sigmoid(z)
    dc+= (h-y_real)*xs
self.coeff-= dc * (learning_rate/n)
Mar 10, 2022 in Machine Learning by Dev
• 6,000 points
734 views

1 answer to this question.

0 votes
What did the intercept teach you? It's hardly surprising, given that your y is a third-degree polynomial and your model only has two coefficients, whereas 3 + y-intercept would be required to model the response variable from predictors.
Furthermore, because to SGD, for example, values may differ, but coefficients may differ, resulting in correct y for a finite number of points.
To be sure and rule out the iterative approach failing, use np.linalg.inv to solve the normal equation and observe the coefficients. Also, check to see if regularization was applied in statsmodels and/or sklearn predicts.
answered Mar 23, 2022 by Nandini
• 5,480 points

Related Questions In Machine Learning

0 votes
1 answer

Different types of Logistic Regression

There are three main types of logistic ...READ MORE

answered May 13, 2019 in Machine Learning by Nikhil
11,825 views
0 votes
1 answer
0 votes
1 answer

Can we change the sigmoid with tanh in Logistic regression transforms??

Hi@Deepanshu, Yes, you can use tanh instead of ...READ MORE

answered May 12, 2020 in Machine Learning by MD
• 95,460 points
2,681 views
0 votes
1 answer
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,391 views
0 votes
1 answer

What is the difference between linear regression and logistic regression?

Hi Dev, to answer your question Linear Regression ...READ MORE

answered Feb 2, 2022 in Machine Learning by Nandini
• 5,480 points
1,144 views
0 votes
1 answer

Assumptions of Naïve Bayes and Logistic Regression

There are very few difference between Naive ...READ MORE

answered Feb 7, 2022 in Machine Learning by Nandini
• 5,480 points
465 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP