Ulrich Terstiege

h-index9

3papers

110citations

Novelty40%

AI Score36

Ranked #120,507 of 201,326 authors (top 60%)#371 in OC (top 56%)

3 Papers

OCJan 13

Convergence of gradient flow for learning convolutional neural networks

Jona-Maria Diederen, Holger Rauhut, Ulrich Terstiege

Convolutional neural networks are widely used in imaging and image recognition. Learning such networks from training data leads to the minimization of a non-convex function. This makes the analysis of standard optimization methods such as variants of (stochastic) gradient descent challenging. In this article we study the simplified setting of linear convolutional networks. We show that the gradient flow (to be interpreted as an abstraction of gradient descent) applied to the empirical risk defined via certain loss functions including the square loss always converges to a critical point, under a mild condition on the training data.

LGAug 4, 2021

Convergence of gradient descent for learning linear neural networks

Gabin Maxime Nguegnang, Holger Rauhut, Ulrich Terstiege

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the step sizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.

OCOct 12, 2019

Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

Bubacarr Bah, Holger Rauhut, Ulrich Terstiege et al.

We study the convergence of gradient flows related to learning deep linear neural networks (where the activation function is the identity map) from data. In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. The gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-$r$ matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, we establish that, for almost all initializations, the flow converges to a global minimum on the manifold of rank $k$ matrices for some $k\leq r$.