Leonie Kreis

3papers

9citations

Novelty43%

AI Score25

Ranked #170,199 of 205,806 authors (top 83%)#36,966 in LG (top 87%)

3 Papers

LGNov 27, 2023Code

SensLI: Sensitivity-Based Layer Insertion for Neural Networks

Leonie Kreis, Evelyn Herberg, Frederik Köhne et al.

The training of neural networks requires tedious and often manual tuning of the network architecture. We propose a systematic approach to inserting new layers during the training process. Our method eliminates the need to choose a fixed network size before training, is numerically inexpensive to execute and applicable to various architectures including fully connected feedforward networks, ResNets and CNNs. Our technique borrows ideas from constrained optimization and is based on first-order sensitivity information of the loss function with respect to the virtual parameters that additional layers, if inserted, would offer. In numerical experiments, our proposed sensitivity-based layer insertion technique (SensLI) exhibits improved performance on training loss and test error, compared to training on a fixed architecture, and reduced computational effort in comparison to training the extended architecture from the beginning. Our code is available on https://github.com/mathemml/SensLI.

LGNov 26, 2023

Frobenius-Type Norms and Inner Products of Matrices and Linear Maps with Applications to Neural Network Training

Roland Herzog, Frederik Köhne, Leonie Kreis et al.

The Frobenius norm is a frequent choice of norm for matrices. In particular, the underlying Frobenius inner product is typically used to evaluate the gradient of an objective with respect to matrix variable, such as those occuring in the training of neural networks. We provide a broader view on the Frobenius norm and inner product for linear maps or matrices, and establish their dependence on inner products in the domain and co-domain spaces. This shows that the classical Frobenius norm is merely one special element of a family of more general Frobenius-type norms. The significant extra freedom furnished by this realization can be used, among other things, to precondition neural network training.

OCNov 28, 2023

Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent

Frederik Köhne, Leonie Kreis, Anton Schiela et al.

This paper proposes a novel approach to adaptive step sizes in stochastic gradient descent (SGD) by utilizing quantities that we have identified as numerically traceable -- the Lipschitz constant for gradients and a concept of the local variance in search directions. Our findings yield a nearly hyperparameter-free algorithm for stochastic optimization, which has provable convergence properties and exhibits truly problem adaptive behavior on classical image classification tasks. Our framework is set in a general Hilbert space and thus enables the potential inclusion of a preconditioner through the choice of the inner product.