An example you gave is one-dimensional, which is unusual in machine learning, since numerous input features are frequently present. To apply their basic approach, you must invert a matrix, which can be difficult or ill-conditioned.

The problem is usually stated as a least - square problem, which is relatively easier. Instead of using gradient descent, ordinary least square solvers could be utilized (and often are). If the number of data points is large, a typical least squares solver may be excessively expensive, but (stochastic) gradient descent could provide a solution that is as good as a more precise solution in terms of test-set error but requiring orders of magnitude less time.

If your problem is small enough to be handled quickly with an off-the-shelf least squares solver, gradient descent is probably not for you.

Because gradient descent is a general algorithm, it can be used to solve any problem involving the optimization of a cost function. The mean square error is a common cost function used in regression problems (MSE). Finding a closed form solution necessitates inverting an ill-conditioned matrix (one whose determinant is very close to zero and so does not yield a robust inverse matrix). People frequently use the gradient descent method to find a solution which does not suffer from the ill-conditionally problem to get around this problem.