Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
This work addresses foundational theoretical challenges in machine learning for researchers, but it is incremental as it assembles existing pieces rather than introducing new paradigms.
The paper tackles the gap between deep learning practice and mathematical theory by exploring interpolation and over-parameterization as key concepts to understand generalization and optimization in neural networks, aiming to contribute toward a general theory of machine learning.
In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation, and its sibling, over-parameterization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parameterization enables interpolation and provides flexibility to select a right interpolating model. As we will see, just as a physical prism separates colors mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern Machine Learning. This article is written with belief and hope that clearer understanding of these issues brings us a step closer toward a general theory of deep learning and machine learning.