STNov 4, 2022
Concentration inequalities for leave-one-out cross validationBenny Avelin, Lauri Viitasaari
In this article we prove that estimator stability is enough to show that leave-one-out cross validation is a sound procedure, by providing concentration bounds in a general framework. In particular, we provide concentration bounds beyond Lipschitz continuity assumptions on the loss or on the estimator. We obtain our results by relying on random variables with distribution satisfying the logarithmic Sobolev inequality, providing us a relatively rich class of distributions. We illustrate our method by considering several interesting examples, including linear regression, kernel density estimation, and stabilized/truncated estimators such as stabilized kernel regression.
MLDec 31, 2022
Exploring Singularities in point clouds with the graph Laplacian: An explicit approachMartin Andersson, Benny Avelin
We develop theory and methods that use the graph Laplacian to analyze the geometry of the underlying manifold of datasets. Our theory provides theoretical guarantees and explicit bounds on the functional forms of the graph Laplacian when it acts on functions defined close to singularities of the underlying manifold. We use these explicit bounds to develop tests for singularities and propose methods that can be used to estimate geometric properties of singularities in the datasets.
LGFeb 3, 2025
Training in reverse: How iteration order influences convergence and stability in deep learningBenoit Dherin, Benny Avelin, Anders Karlsson et al.
Despite exceptional achievements, training neural networks remains computationally expensive and is often plagued by instabilities that can degrade convergence. While learning rate schedules can help mitigate these issues, finding optimal schedules is time-consuming and resource-intensive. This work explores theoretical issues concerning training stability in the constant-learning-rate (i.e., without schedule) and small-batch-size regime. Surprisingly, we show that the order of gradient updates affects stability and convergence in gradient-based optimizers. We illustrate this new line of thinking using backward-SGD, which processes batch gradient updates like SGD but in reverse order. Our theoretical analysis shows that in contractive regions (e.g., around minima) backward-SGD converges to a point while the standard forward-SGD generally only converges to a distribution. This leads to improved stability and convergence which we demonstrate experimentally. While full backward-SGD is computationally intensive in practice, it highlights opportunities to exploit reverse training dynamics (or more generally alternate iteration orders) to improve training. To our knowledge, this represents a new and unexplored avenue in deep learning optimization.
MEDec 8, 2023
Sequential inductive prediction intervalsBenny Avelin
In this paper we explore the concept of sequential inductive prediction intervals using theory from sequential testing. We furthermore introduce a 3-parameter PAC definition of prediction intervals that allows us via simulation to achieve almost sharp bounds with high probability.
LGApr 21, 2021
Deep limits and cut-off phenomena for neural networksBenny Avelin, Anders Karlsson
We consider dynamical and geometrical aspects of deep learning. For many standard choices of layer maps we display semi-invariant metrics which quantify differences between data or decision functions. This allows us, when considering random layer maps and using non-commutative ergodic theorems, to deduce that certain limits exist when letting the number of layers tend to infinity. We also examine the random initialization of standard networks where we observe a surprising cut-off phenomenon in terms of the number of layers, the depth of the network. This could be a relevant parameter when choosing an appropriate number of layers for a given learning task, or for selecting a good initialization procedure. More generally, we hope that the notions and results in this paper can provide a framework, in particular a geometric one, for a part of the theoretical understanding of deep neural networks.
IVJan 18, 2021
Uncertainty-Aware Body Composition Analysis with Deep Regression Ensembles on UK Biobank MRITaro Langner, Fredrik K. Gustafsson, Benny Avelin et al.
Along with rich health-related metadata, medical images have been acquired for over 40,000 male and female UK Biobank participants, aged 44-82, since 2014. Phenotypes derived from these images, such as measurements of body composition from MRI, can reveal new links between genetics, cardiovascular disease, and metabolic conditions. In this work, six measurements of body composition and adipose tissues were automatically estimated by image-based, deep regression with ResNet50 neural networks from neck-to-knee body MRI. Despite the potential for high speed and accuracy, these networks produce no output segmentations that could indicate the reliability of individual measurements. The presented experiments therefore examine uncertainty quantification with mean-variance regression and ensembling to estimate individual measurement errors and thereby identify potential outliers, anomalies, and other failure cases automatically. In 10-fold cross-validation on data of about 8,500 subjects, mean-variance regression and ensembling showed complementary benefits, reducing the mean absolute error across all predictions by 12%. Both improved the calibration of uncertainties and their ability to identify high prediction errors. With intra-class correlation coefficients (ICC) above 0.97, all targets except the liver fat content yielded relative measurement errors below 5%. Testing on another 1,000 subjects showed consistent performance, and the method was finally deployed for inference to 30,000 subjects with missing reference values. The results indicate that deep regression ensembles could ultimately provide automated, uncertainty-aware measurements of body composition for more than 120,000 UK Biobank neck-to-knee body MRI that are to be acquired within the coming years.
APDec 15, 2020
Approximation of BV functions by neural networks: A regularity theory approachBenny Avelin, Vesa Julin
In this paper we are concerned with the approximation of functions by single hidden layer neural networks with ReLU activation functions on the unit circle. In particular, we are interested in the case when the number of data-points exceeds the number of nodes. We first study the convergence to equilibrium of the stochastic gradient flow associated with the cost function with a quadratic penalization. Specifically, we prove a Poincaré inequality for a penalized version of the cost function with explicit constants that are independent of the data and of the number of nodes. As our penalization biases the weights to be bounded, this leads us to study how well a network with bounded weights can approximate a given function of bounded variation (BV). Our main contribution concerning approximation of BV functions, is a result which we call the localization theorem. Specifically, it states that the expected error of the constrained problem, where the length of the weights are less than $R$, is of order $R^{-1/9}$ with respect to the unconstrained problem (the global optimum). The proof is novel in this topic and is inspired by techniques from regularity theory of elliptic partial differential equations. Finally we quantify the expected value of the global optimum by proving a quantitative version of the universal approximation theorem.
MLJun 28, 2019
Neural ODEs as the Deep Limit of ResNets with constant weightsBenny Avelin, Kaj Nyström
In this paper we prove that, in the deep limit, the stochastic gradient descent on a ResNet type deep neural network, where each layer shares the same weight matrix, converges to the stochastic gradient descent for a Neural ODE and that the corresponding value/loss functions converge. Our result gives, in the context of minimization by stochastic gradient descent, a theoretical foundation for considering Neural ODEs as the deep limit of ResNets. Our proof is based on certain decay estimates for associated Fokker-Planck equations.