I'm trying to write my own logistic regressor (using batch/mini-batch gradient descent) for practice purposes.

I generated a random dataset (see below) with normally distributed inputs, and the output is binary (0,1). I manually used coefficients for the input and was hoping to be able to reproduce them (see below for the code snippet). However, to my surprise, neither my own code, nor sklearn LogisticRegression were able to reproduce the actual numbers (although the sign and order of magnitude are in line). Moreso, the coefficients my algorithm produced are different than the one produced by sklearn. Am I misinterpreting what the coefficients for a logistic regression are?

I will appreciate any insight into this discrepancy.

Thank you!

edit: I tried using statsmodels Logit and got yet a third set of slightly different values for the coefficients

Some more info that might be relevant: I wrote a linear regressor using an almost identical code and it worked perfectly, so I am fairly confident this is not a problem in the code. Also my regressor actually outperformed the sklearn one on the training set, and they have the exact same accuracy on the test set, so I have no reason to believe the regressors are wrong.

Code snippets for the generation of the dataset:

```o1 = 2
o2 = -3
x[:,1]=np.random.rand(size)*2
x[:,2]=np.random.rand(size)*3
y = np.vectorize(sigmoid)(x[:,1]*o1+x[:,2]*o2 + np.random.normal(size=size))
```

so as can be seen, input coefficients are +2 and -3 (intercept 0); sklearn coefficients were ~2.8 and ~-4.8; my coefficients were ~1.7 and ~-2.6

and of the regressor (the most relevant parts of it):

```for j in range(bin_size):
xs = x[i]
y_real = y[i]
z = np.dot(self.coeff,xs)
h = sigmoid(z)
dc+= (h-y_real)*xs
self.coeff-= dc * (learning_rate/n)```
Mar 10, 2022 441 views

## 1 answer to this question.

What did the intercept teach you? It's hardly surprising, given that your y is a third-degree polynomial and your model only has two coefficients, whereas 3 + y-intercept would be required to model the response variable from predictors.
Furthermore, because to SGD, for example, values may differ, but coefficients may differ, resulting in correct y for a finite number of points.
To be sure and rule out the iterative approach failing, use np.linalg.inv to solve the normal equation and observe the coefficients. Also, check to see if regularization was applied in statsmodels and/or sklearn predicts.
answered Mar 23, 2022 by
• 5,480 points

## Different types of Logistic Regression

There are three main types of logistic ...READ MORE

## Example of Logistic regression with python code

Have a look at this: import csv import ...READ MORE

## Can we change the sigmoid with tanh in Logistic regression transforms??

Hi@Deepanshu, Yes, you can use tanh instead of ...READ MORE

## Can someone explain to me the difference between a cost function and the gradient descent equation in logistic regression?

when we train a model with data, ...READ MORE

## what is C parameter in sklearn Logistic Regression?

C is known as a "hyperparameter." The ...READ MORE

## how do i change string to a list?

suppose you have a string with a ...READ MORE

## how can i randomly select items from a list?

You can also use the random library's ...READ MORE

+1 vote

## how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE