OC LGJan 13

Convergence of gradient flow for learning convolutional neural networks

Jona-Maria Diederen, Holger Rauhut, Ulrich Terstiege

arXiv:2601.08547v12.5h-index: 9

Originality Synthesis-oriented

AI Analysis

This provides theoretical guarantees for optimization in a simplified setting, which is incremental for researchers in machine learning theory.

The paper tackles the challenge of analyzing optimization methods for non-convex functions in convolutional neural networks by studying linear convolutional networks, showing that gradient flow applied to empirical risk with certain loss functions always converges to a critical point under mild data conditions.

Convolutional neural networks are widely used in imaging and image recognition. Learning such networks from training data leads to the minimization of a non-convex function. This makes the analysis of standard optimization methods such as variants of (stochastic) gradient descent challenging. In this article we study the simplified setting of linear convolutional networks. We show that the gradient flow (to be interpreted as an abstraction of gradient descent) applied to the empirical risk defined via certain loss functions including the square loss always converges to a critical point, under a mild condition on the training data.

View on arXiv PDF

Similar