Holger Rauhut

h-index40

33papers

1,609citations

Novelty42%

AI Score36

Ranked #96,016 of 194,257 authors (top 49%)#606 in NA (top 25%)

33 Papers

6.6NAJul 15, 2011

Low-rank matrix recovery via iteratively reweighted least squares minimization

Massimo Fornasier, Holger Rauhut, Rachel Ward

We present and analyze an efficient implementation of an iteratively reweighted least squares algorithm for recovering a matrix from a small number of linear measurements. The algorithm is designed for the simultaneous promotion of both a minimal nuclear norm and an approximatively low-rank solution. Under the assumption that the linear measurements fulfill a suitable generalization of the Null Space Property known in the context of compressed sensing, the algorithm is guaranteed to recover iteratively any matrix with an error of the order of the best k-rank approximation. In certain relevant cases, for instance for the matrix completion problem, our version of this algorithm can take advantage of the Woodbury matrix identity, which allows to expedite the solution of the least squares problems required at each iteration. We present numerical experiments that confirm the robustness of the algorithm for the solution of matrix completion problems, and demonstrate its competitiveness with respect to other techniques proposed recently in the literature.

4.3ITFeb 16, 2016

Low rank tensor recovery via iterative hard thresholding

Holger Rauhut, Reinhold Schneider, Zeljka Stojanac

We study extensions of compressive sensing and low rank matrix recovery (matrix completion) to the recovery of low rank tensors of higher order from a small number of linear measurements. While the theoretical understanding of low rank matrix recovery is already well-developed, only few contributions on the low rank tensor recovery problem are available so far. In this paper, we introduce versions of the iterative hard thresholding algorithm for several tensor decompositions, namely the higher order singular value decomposition (HOSVD), the tensor train format (TT), and the general hierarchical Tucker decomposition (HT). We provide a partial convergence result for these algorithms which is based on a variant of the restricted isometry property of the measurement operator adapted to the tensor decomposition at hand that induces a corresponding notion of tensor rank. We show that subgaussian measurement ensembles satisfy the tensor restricted isometry property with high probability under a certain almost optimal bound on the number of measurements which depends on the corresponding tensor format. These bounds are extended to partial Fourier maps combined with random sign flips of the tensor entries. Finally, we illustrate the performance of iterative hard thresholding methods for tensor recovery via numerical experiments where we consider recovery from Gaussian random measurements, tensor completion (recovery of missing entries), and Fourier measurements for third order tensors.

1.2NADec 15, 2017

Multi-level Compressed Sensing Petrov-Galerkin discretization of high-dimensional parametric PDEs

Jean-Luc Bouchot, Holger Rauhut, Christoph Schwab

We analyze a novel multi-level version of a recently introduced compressed sensing (CS) Petrov-Galerkin (PG) method from [H. Rauhut and Ch. Schwab: Compressive Sensing Petrov-Galerkin approximation of high-dimensional parametric operator equations, Math. Comp. 304(2017) 661-700] for the solution of many-parametric partial differential equations. We propose to use multi-level PG discretizations, based on a hierarchy of nested finite dimensional subspaces, and to reconstruct parametric solutions at each level from level-dependent random samples of the high-dimensional parameter space via CS methods such as weighted l1-minimization. For affine parametric, linear operator equations, we prove that our approach allows to approximate the parametric solution with (almost) optimal convergence order as specified by certain summability properties of the coefficient sequence in a general polynomial chaos expansion of the parametric solution and by the convergence order of the PG discretization in the physical variables. The computations of the parameter samples of the PDE solution is "embarrassingly parallel", as in Monte-Carlo Methods. Contrary to other recent approaches, and as already noted in [A. Doostan and H. Owhadi: A non-adapted sparse approximation of PDEs with stochastic inputs. JCP 230(2011) 3015-3034] the optimality of the computed approximations does not require a-priori assumptions on ordering and structure of the index sets of the largest gpc coefficients (such as the "downward closed" property). We prove that under certain assumptions work versus accuracy of the new algorithms is asymptotically equal to that of one PG solve for the corresponding nominal problem on the finest discretization level up to a constant.

5.9NAFeb 20, 2011

Sparse recovery for spherical harmonic expansions

Holger Rauhut, Rachel Ward

We show that sparse spherical harmonic expansions can be efficiently recovered from a small number of randomly chosen samples on the sphere. To establish the main result, we verify the restricted isometry property of an associated preconditioned random measurement matrix using recent estimates on the uniform growth of Jacobi polynomials.

1.2ITApr 24, 2013

Remote sensing via $\ell_1$ minimization

Max Hügel, Holger Rauhut, Thomas Strohmer

We consider the problem of detecting the locations of targets in the far field by sending probing signals from an antenna array and recording the reflected echoes. Drawing on key concepts from the area of compressive sensing, we use an $\ell_1$-based regularization approach to solve this, in general ill-posed, inverse scattering problem. As common in compressed sensing, we exploit randomness, which in this context comes from choosing the antenna locations at random. With $n$ antennas we obtain $n^2$ measurements of a vector $x \in \C^{N}$ representing the target locations and reflectivities on a discretized grid. It is common to assume that the scene $x$ is sparse due to a limited number of targets. Under a natural condition on the mesh size of the grid, we show that an $s$-sparse scene can be recovered via $\ell_1$-minimization with high probability if $n^2 \geq C s \log^2(N)$. The reconstruction is stable under noise and under passing from sparse to approximately sparse vectors. Our theoretical findings are confirmed by numerical simulations.

1.2NAFeb 23, 2016

Conjugate gradient acceleration of iteratively re-weighted least squares methods

Massimo Fornasier, Steffen Peter, Holger Rauhut et al.

Iteratively Re-weighted Least Squares (IRLS) is a method for solving minimization problems involving non-quadratic cost functions, perhaps non-convex and non-smooth, which however can be described as the infimum over a family of quadratic functions. This transformation suggests an algorithmic scheme that solves a sequence of quadratic problems to be tackled efficiently by tools of numerical linear algebra. Its general scope and its usually simple implementation, transforming the initial non-convex and non-smooth minimization problem into a more familiar and easily solvable quadratic optimization problem, make it a versatile algorithm. However, despite its simplicity, versatility, and elegant analysis, the complexity of IRLS strongly depends on the way the solution of the successive quadratic optimizations is addressed. For the important special case of $\textit{compressed sensing}$ and sparse recovery problems in signal processing, we investigate theoretically and numerically how accurately one needs to solve the quadratic problems by means of the $\textit{conjugate gradient}$ (CG) method in each iteration in order to guarantee convergence. The use of the CG method may significantly speed-up the numerical solution of the quadratic subproblems, in particular, when fast matrix-vector multiplication (exploiting for instance the FFT) is available for the matrix involved. In addition, we study convergence rates. Our modified IRLS method outperforms state of the art first order methods such as Iterative Hard Thresholding (IHT) or Fast Iterative Soft-Thresholding Algorithm (FISTA) in many situations, especially in large dimensions. Moreover, IRLS is often able to recover sparse vectors from fewer measurements than required for IHT and FISTA.

1.2NAFeb 17, 2013

Fast and RIP-optimal transforms

Nir Ailon, Holger Rauhut

We study constructions of $k \times n$ matrices $A$ that both (1) satisfy the restricted isometry property (RIP) at sparsity $s$ with optimal parameters, and (2) are efficient in the sense that only $O(n\log n)$ operations are required to compute $Ax$ given a vector $x$. Our construction is based on repeated application of independent transformations of the form $DH$, where $H$ is a Hadamard or Fourier transform and $D$ is a diagonal matrix with random $\{+1,-1\}$ elements on the diagonal, followed by any $k \times n$ matrix of orthonormal rows (e.g.\ selection of $k$ coordinates). We provide guarantees (1) and (2) for a larger regime of parameters for which such constructions were previously unknown. Additionally, our construction does not suffer from the extra poly-logarithmic factor multiplying the number of observations $k$ as a function of the sparsity $s$, as present in the currently best known RIP estimates for partial random Fourier matrices and other classes of structured random matrices.

18.3OCJun 22, 2023

Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

Leonardo Galli, Holger Rauhut, Mark Schmidt

Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function. We explore nonmonotone line search methods to relax this condition and possibly accept larger step sizes. Despite the lack of a monotonic decrease, we prove the same fast rates of convergence as in the monotone case. Our experiments show that nonmonotone methods improve the speed of convergence and generalization properties of SGD/Adam even beyond the previous monotone line searches. We propose a POlyak NOnmonotone Stochastic (PoNoS) method, obtained by combining a nonmonotone line search with a Polyak initial step size. Furthermore, we develop a new resetting technique that in the majority of the iterations reduces the amount of backtracks to zero while still maintaining a large initial step size. To the best of our knowledge, a first runtime comparison shows that the epoch-wise advantage of line-search-based methods gets reflected in the overall computational time.

2.3NAApr 2, 2011

Sparse Legendre expansions via $\ell_1$ minimization

Holger Rauhut, Rachel Ward

We consider the problem of recovering polynomials that are sparse with respect to the basis of Legendre polynomials from a small number of random samples. In particular, we show that a Legendre s-sparse polynomial of maximal degree N can be recovered from m = O(s log^4(N)) random samples that are chosen independently according to the Chebyshev probability measure. As an efficient recovery method, l1-minimization can be used. We establish these results by verifying the restricted isometry property of a preconditioned random Legendre matrix. We then extend these results to a large class of orthogonal polynomial systems, including the Jacobi polynomials, of which the Legendre polynomials are a special case. Finally, we transpose these results into the setting of approximate recovery for functions in certain infinite-dimensional function spaces.

4.3SPJul 18, 2024

High-Dimensional Confidence Regions in Sparse MRI

Frederik Hoppe, Felix Krahmer, Claudio Mayrink Verdun et al.

One of the most promising solutions for uncertainty quantification in high-dimensional statistics is the debiased LASSO that relies on unconstrained $\ell_1$-minimization. The initial works focused on real Gaussian designs as a toy model for this problem. However, in medical imaging applications, such as compressive sensing for MRI, the measurement system is represented by a (subsampled) complex Fourier matrix. The purpose of this work is to extend the method to the MRI case in order to construct confidence intervals for each pixel of an MR image. We show that a sufficient amount of data is $n \gtrsim \max\{ s_0\log^2 s_0\log p, s_0 \log^2 p \}$.

5.9MLSep 14, 2023

Uncertainty quantification for learned ISTA

Frederik Hoppe, Claudio Mayrink Verdun, Felix Krahmer et al.

Model-based deep learning solutions to inverse problems have attracted increasing attention in recent years as they bridge state-of-the-art numerical performance with interpretability. In addition, the incorporated prior domain knowledge can make the training more efficient as the smaller number of parameters allows the training step to be executed with smaller datasets. Algorithm unrolling schemes stand out among these model-based learning techniques. Despite their rapid advancement and their close connection to traditional high-dimensional statistical methods, they lack certainty estimates and a theory for uncertainty quantification is still elusive. This work provides a step towards closing this gap proposing a rigorous way to obtain confidence intervals for the LISTA estimator.

4.6LGJul 18, 2024Code

Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning

Frederik Hoppe, Claudio Mayrink Verdun, Hannah Laus et al.

Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. We develop a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. Our work rigorously addresses this issue and derives a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, our analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings bridge the gap between established theory and the practical applicability of such debiased methods.

1.2SPJul 18, 2024

With or Without Replacement? Improving Confidence in Fourier Imaging

Frederik Hoppe, Claudio Mayrink Verdun, Felix Krahmer et al.

Over the last few years, debiased estimators have been proposed in order to establish rigorous confidence intervals for high-dimensional problems in machine learning and data science. The core argument is that the error of these estimators with respect to the ground truth can be expressed as a Gaussian variable plus a remainder term that vanishes as long as the dimension of the problem is sufficiently high. Thus, uncertainty quantification (UQ) can be performed exploiting the Gaussian model. Empirically, however, the remainder term cannot be neglected in many realistic situations of moderately-sized dimensions, in particular in certain structured measurement scenarios such as Magnetic Resonance Imaging (MRI). This, in turn, can downgrade the advantage of the UQ methods as compared to non-UQ approaches such as the standard LASSO. In this paper, we present a method to improve the debiased estimator by sampling without replacement. Our approach leverages recent results of ours on the structure of the random nature of certain sampling schemes showing how a transition between sampling with and without replacement can lead to a weighted reconstruction scheme with improved performance for the standard LASSO. In this paper, we illustrate how this reweighted sampling idea can also improve the debiased estimator and, consequently, provide a better method for UQ in Fourier imaging.

9.4LGJul 8, 2025

The Riemannian Geometry associated to Gradient Flows of Linear Convolutional Networks

El Mehdi Achour, Kathlén Kohn, Holger Rauhut

We study geometric properties of the gradient flow for learning deep linear convolutional networks. For linear fully connected networks, it has been shown recently that the corresponding gradient flow on parameter space can be written as a Riemannian gradient flow on function space (i.e., on the product of weight matrices) if the initialization satisfies a so-called balancedness condition. We establish that the gradient flow on parameter space for learning linear convolutional networks can be written as a Riemannian gradient flow on function space regardless of the initialization. This result holds for $D$-dimensional convolutions with $D \geq 2$, and for $D =1$ it holds if all so-called strides of the convolutions are greater than one. The corresponding Riemannian metric depends on the initialization.

11.5LGMay 9, 2023

Robust Implicit Regularization via Weight Normalization

Hung-Hsu Chou, Holger Rauhut, Rachel Ward

Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear networks, explaining to some extent why overparameterized neural network models trained by gradient descent tend to have good generalization performance in practice. However, existing theory for square-loss objectives often requires very small initialization of the trainable weights, which is at odds with the larger scale at which weights are initialized in practice for faster convergence and better generalization performance. In this paper, we aim to close this gap by incorporating and analyzing gradient flow (continuous-time version of gradient descent) with weight normalization, where the weight vector is reparameterized in terms of polar coordinates, and gradient flow is applied to the polar coordinates. By analyzing key invariants of the gradient flow and using Lojasiewicz Theorem, we show that weight normalization also has an implicit bias towards sparse solutions in the diagonal linear model, but that in contrast to plain gradient flow, weight normalization enables a robust bias that persists even if the weights are initialized at practically large scale. Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization in overparameterized diagonal linear network models.

19.1OCDec 21, 2021

More is Less: Inducing Sparsity via Overparameterization

Hung-Hsu Chou, Johannes Maly, Holger Rauhut

In deep learning it is common to overparameterize neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon we study the special case of sparse recovery (compressed sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, if there exists an exact solution, vanilla gradient flow for the overparameterized loss functional converges to a good approximation of the solution of minimal $\ell_1$-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressed sensing via gradient flow/descent on overparameterized models derived in previous works. The theory accurately predicts the recovery rate in numerical experiments. Our proof relies on analyzing a certain Bregman divergence of the flow. This bypasses the obstacles caused by non-convexity and should be of independent interest.

9.2LGDec 8, 2021

Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks

Ekkehard Schnoor, Arash Behboodi, Holger Rauhut

Motivated by the learned iterative soft thresholding algorithm (LISTA), we introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements. By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types, ranging from recurrent ones to networks more similar to standard feedforward neural networks. Based on training samples, via empirical risk minimization we aim at learning the optimal network parameters and thereby the optimal network that reconstructs signals from their low-dimensional linear measurements. We derive generalization bounds by analyzing the Rademacher complexity of hypothesis classes consisting of such deep networks, that also take into account the thresholding parameters. We obtain estimates of the sample complexity that essentially depend only linearly on the number of parameters and on the depth. We apply our main result to obtain specific generalization bounds for several practical examples, including different algorithms for (implicit) dictionary learning, and convolutional neural networks.

1.2NAOct 13, 2021

Spark Deficient Gabor Frames for Inverse Problems

Vasiliki Kouni, Holger Rauhut

In this paper, we apply star-Digital Gabor Transform in analysis Compressed Sensing and speech denoising. Based on assumptions on the ambient dimension, we produce a window vector that generates a spark deficient Gabor frame with many linear dependencies among its elements. We conduct computational experiments on both synthetic and real-world signals, using as baseline three Gabor transforms generated by state-of-the-art window vectors and compare their performance to star-Gabor transform. Results show that the proposed star-Gabor transform outperforms all others in all signal cases.

6.6ITOct 13, 2021Code

ADMM-DAD net: a deep unfolding network for analysis compressed sensing

Vasiliki Kouni, Georgios Paraskevopoulos, Holger Rauhut et al.

In this paper, we propose a new deep unfolding neural network based on the ADMM algorithm for analysis Compressed Sensing. The proposed network jointly learns a redundant analysis operator for sparsification and reconstructs the signal of interest. We compare our proposed network with a state-of-the-art unfolded ISTA decoder, that also learns an orthogonal sparsifier. Moreover, we consider not only image, but also speech datasets as test examples. Computational experiments demonstrate that our proposed network outperforms the state-of-the-art deep unfolding network, consistently for both real-world image and speech datasets.

3.6MLAug 6, 2021

Path classification by stochastic linear recurrent neural networks

Wiebke Bartolomaeus, Youness Boutaib, Sandra Nestler et al.

We investigate the functioning of a classifying biological neural network from the perspective of statistical learning theory, modelled, in a simplified setting, as a continuous-time stochastic recurrent neural network (RNN) with identity activation function. In the purely stochastic (robust) regime, we give a generalisation error bound that holds with high probability, thus showing that the empirical risk minimiser is the best-in-class hypothesis. We show that RNNs retain a partial signature of the paths they are fed as the unique information exploited for training and classification tasks. We argue that these RNNs are easy to train and robust and back these observations with numerical experiments on both synthetic and real data. We also exhibit a trade-off phenomenon between accuracy and robustness.

13.1LGAug 4, 2021

Convergence of gradient descent for learning linear neural networks

Gabin Maxime Nguegnang, Holger Rauhut, Ulrich Terstiege

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the step sizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.

4.3SDApr 29, 2021

Star DGT: a Robust Gabor Transform for Speech Denoising

Vasiliki Kouni, Holger Rauhut, Theoharis Theoharis

In this paper, we address the speech denoising problem, where Gaussian, pink and blue additive noises are to be removed from a given speech signal. Our approach is based on a redundant, analysis-sparse representation of the original speech signal. We pick an eigenvector of the Zauner unitary matrix and -- under certain assumptions on the ambient dimension -- we use it as window vector to generate a spark deficient Gabor frame. The analysis operator associated with such a frame, is a (highly) redundant Gabor transform, which we use as a sparsifying transform in denoising procedure. We conduct computational experiments on real-world speech data, using as baseline three Gabor transforms generated by state-of-the-art window vectors in time-frequency analysis and compare their performance to the proposed Gabor transform. The results show that our proposed redundant Gabor transform outperforms all others, consistently for all signals.

18.6LGNov 27, 2020

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

Hung-Hsu Chou, Carsten Gieshoff, Johannes Maly et al.

In deep learning, it is common to use more network parameters than training points. In such scenarioof over-parameterization, there are usually multiple networks that achieve zero training error so that thetraining algorithm induces an implicit bias on the computed solution. In practice, (stochastic) gradientdescent tends to prefer solutions which generalize well, which provides a possible explanation of thesuccess of deep learning. In this paper we analyze the dynamics of gradient descent in the simplifiedsetting of linear networks and of an estimation problem. Although we are not in an overparameterizedscenario, our analysis nevertheless provides insights into the phenomenon of implicit bias. In fact, wederive a rigorous analysis of the dynamics of vanilla gradient descent, and characterize the dynamicalconvergence of the spectrum. We are able to accurately locate time intervals where the effective rankof the iterates is close to the effective rank of a low-rank projection of the ground-truth matrix. Inpractice, those intervals can be used as criteria for early stopping if a certain regularity is desired. Wealso provide empirical evidence for implicit bias in more general scenarios, such as matrix sensing andrandom initialization. This suggests that deep learning prefers trajectories whose complexity (measuredin terms of effective rank) is monotonically increasing, which we believe is a fundamental concept for thetheoretical understanding of deep learning.

11.3STOct 29, 2020

Compressive Sensing and Neural Networks from a Statistical Learning Perspective

Arash Behboodi, Holger Rauhut, Ekkehard Schnoor

Various iterative reconstruction algorithms for inverse problems can be unfolded as neural networks. Empirically, this approach has often led to improved results, but theoretical guarantees are still scarce. While some progress on generalization properties of neural networks have been made, great challenges remain. In this chapter, we discuss and combine these topics to present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements. The hypothesis class considered is inspired by the classical iterative soft-thresholding algorithm (ISTA). The neural networks in this class are obtained by unfolding iterations of ISTA and learning some of the weights. Based on training samples, we aim at learning the optimal network parameters via empirical risk minimization and thereby the optimal network that reconstructs signals from their compressive linear measurements. In particular, we may learn a sparsity basis that is shared by all of the iterations/layers and thereby obtain a new approach for dictionary learning. For this class of networks, we present a generalization bound, which is based on bounding the Rademacher complexity of hypothesis classes consisting of such deep networks via Dudley's integral. Remarkably, under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.

2.3DIS-NNOct 13, 2020

Unfolding recurrence by Green's functions for optimized reservoir computing

Sandra Nestler, Christian Keup, David Dahmen et al.

Cortical networks are strongly recurrent, and neurons have intrinsic temporal dynamics. This sets them apart from deep feed-forward networks. Despite the tremendous progress in the application of feed-forward networks and their theoretical understanding, it remains unclear how the interplay of recurrence and non-linearities in recurrent cortical networks contributes to their function. The purpose of this work is to present a solvable recurrent network model that links to feed forward networks. By perturbative methods we transform the time-continuous, recurrent dynamics into an effective feed-forward structure of linear and non-linear temporal kernels. The resulting analytical expressions allow us to build optimal time-series classifiers from random reservoir networks. Firstly, this allows us to optimize not only the readout vectors, but also the input projection, demonstrating a strong potential performance gain. Secondly, the analysis exposes how the second order stimulus statistics is a crucial element that interacts with the non-linearity of the dynamics and boosts performance.

7.2LGJun 15, 2020

Overparameterization and generalization error: weighted trigonometric interpolation

Yuege Xie, Hung-Hsu Chou, Holger Rauhut et al.

Motivated by surprisingly good generalization properties of learned deep neural networks in overparameterized scenarios and by the related double descent phenomenon, this paper analyzes the relation between smoothness and low generalization error in an overparameterized linear learning problem. We study a random Fourier series model, where the task is to estimate the unknown Fourier coefficients from equidistant samples. We derive exact expressions for the generalization error of both plain and weighted least squares estimators. We show precisely how a bias towards smooth interpolants, in the form of weighted trigonometric interpolation, can lead to smaller generalization error in the overparameterized regime compared to the underparameterized regime. This provides insight into the power of overparameterization, which is common in modern machine learning.

25.9OCOct 12, 2019

Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

Bubacarr Bah, Holger Rauhut, Ulrich Terstiege et al.

We study the convergence of gradient flows related to learning deep linear neural networks (where the activation function is the identity map) from data. In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. The gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-$r$ matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, we establish that, for almost all initializations, the flow converges to a global minimum on the manifold of rank $k$ matrices for some $k\leq r$.

1.2NASep 21, 2015

Compressive sensing Petrov-Galerkin approximation of high-dimensional parametric operator equations

Holger Rauhut, Christoph Schwab

We analyze the convergence of compressive sensing based sampling techniques for the efficient evaluation of functionals of solutions for a class of high-dimensional, affine-parametric, linear operator equations which depend on possibly infinitely many parameters. The proposed algorithms are based on so-called "non-intrusive" sampling of the high-dimensional parameter space, reminiscent of Monte-Carlo sampling. In contrast to Monte-Carlo, however, a functional of the parametric solution is then computed via compressive sensing methods from samples of functionals of the solution. A key ingredient in our analysis of independent interest consists in a generalization of recent results on the approximate sparsity of generalized polynomial chaos representations (gpc) of the parametric solution families, in terms of the gpc series with respect to tensorized Chebyshev polynomials. In particular, we establish sufficient conditions on the parametric inputs to the parametric operator equation such that the Chebyshev coefficients of the gpc expansion are contained in certain weighted $\ell_p$-spaces for $0<p\leq 1$. Based on this we show that reconstructions of the parametric solutions computed from the sampled problems converge, with high probability, at the $L_2$, resp. $L_\infty$ convergence rates afforded by best $s$-term approximations of the parametric solution up to logarithmic factors.

1.2FAMar 26, 2015

Interpolation via weighted $l_1$ minimization

Holger Rauhut, Rachel Ward

Functions of interest are often smooth and sparse in some sense, and both priors should be taken into account when interpolating sampled data. Classical linear interpolation methods are effective under strong regularity assumptions, but cannot incorporate nonlinear sparsity structure. At the same time, nonlinear methods such as $l_1$ minimization can reconstruct sparse functions from very few samples, but do not necessarily encourage smoothness. Here we show that weighted $l_1$ minimization effectively merges the two approaches, promoting both sparsity and smoothness in reconstruction. More precisely, we provide specific choices of weights in the $l_1$ objective to achieve rates for functions with coefficient sequences in weighted $l_p$ spaces, $p<=1$. We consider the implications of these results for spherical harmonic and polynomial interpolation, in the univariate and multivariate setting. Along the way, we extend concepts from compressive sensing such as the restricted isometry property and null space property to accommodate weighted sparse expansions; these developments should be of independent interest in the study of structured sparse approximations and continuous-time compressive sensing problems.

1.2NANov 3, 2014

Tensor completion in hierarchical tensor representations

Holger Rauhut, Reinhold Schneider, Zeljka Stojanac

Compressed sensing extends from the recovery of sparse vectors from undersampled measurements via efficient algorithms to the recovery of matrices of low rank from incomplete information. Here we consider a further extension to the reconstruction of tensors of low multi-linear rank in recently introduced hierarchical tensor formats from a small number of measurements. Hierarchical tensors are a flexible generalization of the well-known Tucker representation, which have the advantage that the number of degrees of freedom of a low rank tensor does not scale exponentially with the order of the tensor. While corresponding tensor decompositions can be computed efficiently via successive applications of (matrix) singular value decompositions, some important properties of the singular value decomposition do not extend from the matrix to the tensor case. This results in major computational and theoretical difficulties in designing and analyzing algorithms for low rank tensor recovery. For instance, a canonical analogue of the tensor nuclear norm is NP-hard to compute in general, which is in stark contrast to the matrix case. In this book chapter we consider versions of iterative hard thresholding schemes adapted to hierarchical tensor formats. A variant builds on methods from Riemannian optimization and uses a retraction mapping from the tangent space of the manifold of low rank tensors back to this manifold. We provide first partial convergence results based on a tensor version of the restricted isometry property (TRIP) of the measurement map. Moreover, an estimate of the number of measurements is provided that ensures the TRIP of a given tensor rank with high probability for Gaussian measurement maps.

6.6NAApr 6, 2008

Stability results for random sampling of sparse trigonometric polynomials

Holger Rauhut

Recently, it has been observed that a sparse trigonometric polynomial, i.e. having only a small number of non-zero coefficients, can be reconstructed exactly from a small number of random samples using Basis Pursuit (BP) or Orthogonal Matching Pursuit (OMP). In the present article it is shown that recovery by a BP variant is stable under perturbation of the samples values by noise. A similar partial result for OMP is provided. For BP in addition, the stability result is extended to (non-sparse) trigonometric polynomials that can be well-approximated by sparse ones. The theoretical findings are illustrated by numerical experiments.

11.3CAFeb 25, 2007

Random Sampling of Sparse Trigonometric Polynomials II - Orthogonal Matching Pursuit versus Basis Pursuit

Stefan Kunis, Holger Rauhut

We investigate the problem of reconstructing sparse multivariate trigonometric polynomials from few randomly taken samples by Basis Pursuit and greedy algorithms such as Orthogonal Matching Pursuit (OMP) and Thresholding. While recovery by Basis Pursuit has recently been studied by several authors, we provide theoretical results on the success probability of reconstruction via Thresholding and OMP for both a continuous and a discrete probability model for the sampling points. We present numerical experiments, which indicate that usually Basis Pursuit is significantly slower than greedy algorithms, while the recovery rates are very similar.

9.2NAAug 7, 2006

Recovery algorithms for vector valued data with joint sparsity constraints

Massimo Fornasier, Holger Rauhut

Vector valued data appearing in concrete applications often possess sparse expansions with respect to a preassigned frame for each vector component individually. Additionally, different components may also exhibit common sparsity patterns. Recently, there were introduced sparsity measures that take into account such joint sparsity patterns, promoting coupling of non-vanishing components. These measures are typically constructed as weighted $\ell_1$ norms of componentwise $\ell_q$ norms of frame coefficients. We show how to compute solutions of linear inverse problems with such joint sparsity regularization constraints by fast thresholded Landweber algorithms. Next we discuss the adaptive choice of suitable weights appearing in the definition of sparsity measures. The weights are interpreted as indicators of the sparsity pattern and are iteratively up-dated after each new application of the thresholded Landweber algorithm. The resulting two-step algorithm is interpreted as a double-minimization scheme for a suitable target functional. We show its $\ell_2$-norm convergence. An implementable version of the algorithm is also formulated, and its norm convergence is proven. Numerical experiments in color image restoration are presented.