when we train a model with data, we actually produce some values i.e.; predicted values for a specific feature. But, that specific feature has actual values in the data set. Thus, we we try to keep the predicted values closer to the real values, this ensures better model accuracy and predictions.

We use cost function to measure how close is the prediction of the model i.e. how close are the predicted values to their corresponding real values in the data set.

The weights in the trained model are responsible for predicting the new values accurately.

For Example:

model is

Y = 0.4*X + 0.2, the predicted value will be

(0.4*X + 0.2) the values of X will vary.

Hence, if we consider y as real value corresponding to x, the cost formula will measure how close (0.4*X + 0.2) is to y

We need to find the weight (0.4 and 0.2) for our model to come up with a lowest cost (or closer predicted values to real ones).

One of the optimization algorithm is Gradient descent and it tries to find the minimum cost value in the process of experimenting with alternative weights or updating weights.

We start by running our model with some starting weights, then using gradient descent to update our weights and estimate the cost of our model with those weights over thousands of iterations to get the lowest cost.

Gradient Descent is used for weight updating.