A large number of layers causes serious problems with standard backpropagation algorithms. One of the drawbacks of backward propagation is the problem of local minima. The local minimum of the error function increases at each layer. Not only does the mathematical minimum cause problems, but there may be flat areas of the error function where the steepest descent method does not work (changing one or more weights does not change much).
In a network with many layers, each cell layer can now also provide an abstraction layer that can solve more difficult problems. Deep learning addresses exactly this issue. The basic idea is to perform unsupervised learning at each layer in addition to using the steepest descent method across the network. The goal of unsupervised learning is to extract characteristic features from each layer.
Over the years, several ways have been established to achieve even better results. Techniques such as:
- Residual Network
- Batch Normalization
- Normalization Linear Unit
So no longer a hype now!