NASep 26, 2012
Tight p-fusion framesChristine Bachoc, Martin Ehler
Fusion frames enable signal decompositions into weighted linear subspace components. For positive integers p, we introduce p-fusion frames, a sharpening of the notion of fusion frames. Tight p-fusion frames are closely related to the classical notions of designs and cubature formulas in Grassmann spaces and are analyzed with methods from harmonic analysis in the Grassmannians. We define the p-fusion frame potential, derive bounds for its value, and discuss the connections to tight p-fusion frames.
NADec 30, 2010
Shrinkage Rules for Variational Minimization Problems and Applications to Analytical UltracentrifugationMartin Ehler
Finding a sparse representation of a possibly noisy signal can be modeled as a variational minimization with l_q-sparsity constraints for q less than one. Especially for real-time, on-line, or iterative applications, in which problems of this type have to be solved multiple times, one needs fast algorithms to compute these minimizers. Identifying the exact minimizers is computationally expensive. We consider minimization up to a constant factor to circumvent this limitation. We verify that q-dependent modifications of shrinkage rules provide closed formulas for such minimizers. Therefore, their computation is extremely fast. We also introduce a new shrinkage rule which is adapted to q. To support the theoretical results, the proposed method is applied to Landweber iteration with shrinkage used at each iteration step. This approach is utilized to solve the ill-posed problem of analytic ultracentrifugation, a method to determine the size distribution of macromolecules. For relatively pure solutes, our proposed scheme leads to sparser solutions with sharper peaks, higher resolution, and smaller residuals than standard regularization for this problem.
LGJul 18, 2023
Convex Geometry of ReLU-layers, Injectivity on the Ball and Local ReconstructionDaniel Haider, Martin Ehler, Peter Balazs
The paper uses a frame-theoretic setting to study the injectivity of a ReLU-layer on the closed ball of $\mathbb{R}^n$ and its non-negative part. In particular, the interplay between the radius of the ball and the bias vector is emphasized. Together with a perspective from convex geometry, this leads to a computationally feasible method of verifying the injectivity of a ReLU-layer under reasonable restrictions in terms of an upper bound of the bias vector. Explicit reconstruction formulas are provided, inspired by the duality concept from frame theory. All this gives rise to the possibility of quantifying the invertibility of a ReLU-layer and a concrete reconstruction algorithm for any input vector on the ball.
SDJul 25, 2023
Fitting Auditory Filterbanks with Multiresolution Neural NetworksVincent Lostanlen, Daniel Haider, Han Han et al.
Waveform-based deep learning faces a dilemma between nonparametric and parametric approaches. On one hand, convolutional neural networks (convnets) may approximate any linear time-invariant system; yet, in practice, their frequency responses become more irregular as their receptive fields grow. On the other hand, a parametric model such as LEAF is guaranteed to yield Gabor filters, hence an optimal time-frequency localization; yet, this strong inductive bias comes at the detriment of representational capacity. In this paper, we aim to overcome this dilemma by introducing a neural audio model, named multiresolution neural network (MuReNN). The key idea behind MuReNN is to train separate convolutional operators over the octave subbands of a discrete wavelet transform (DWT). Since the scale of DWT atoms grows exponentially between octaves, the receptive fields of the subsequent learnable convolutions in MuReNN are dilated accordingly. For a given real-world dataset, we fit the magnitude response of MuReNN to that of a well-established auditory filterbank: Gammatone for speech, CQT for music, and third-octave for urban sounds, respectively. This is a form of knowledge distillation (KD), in which the filterbank ''teacher'' is engineered by domain knowledge while the neural network ''student'' is optimized from data. We compare MuReNN to the state of the art in terms of goodness of fit after KD on a hold-out set and in terms of Heisenberg time-frequency localization. Compared to convnets and Gabor convolutions, we find that MuReNN reaches state-of-the-art performance on all three optimization problems.
CVNov 7, 2022
visClust: A visual clustering algorithm based on orthogonal projectionsAnna Breger, Clemens Karner, Martin Ehler
We present a novel clustering algorithm, visClust, that is based on lower dimensional data representations and visual interpretation. Thereto, we design a transformation that allows the data to be represented by a binary integer array enabling the use of image processing methods to select a partition. Qualitative and quantitative analyses measured in accuracy and an adjusted Rand-Index show that the algorithm performs well while requiring low runtime and RAM. We compare the results to 6 state-of-the-art algorithms with available code, confirming the quality of visClust by superior performance in most experiments. Moreover, the algorithm asks for just one obligatory input parameter while allowing optimization via optional parameters. The code is made available on GitHub and straightforward to use.
LGSep 11, 2023
Instabilities in Convnets for Raw AudioDaniel Haider, Vincent Lostanlen, Martin Ehler et al.
What makes waveform-based deep learning so hard? Despite numerous attempts at training convolutional neural networks (convnets) for filterbank design, they often fail to outperform hand-crafted baselines. These baselines are linear time-invariant systems: as such, they can be approximated by convnets with wide receptive fields. Yet, in practice, gradient-based optimization leads to suboptimal approximations. In our article, we approach this phenomenon from the perspective of initialization. We present a theory of large deviations for the energy response of FIR filterbanks with random Gaussian weights. We find that deviations worsen for large filters and locally periodic input signals, which are both typical for audio signal processing applications. Numerical simulations align with our theory and suggest that the condition number of a convolutional layer follows a logarithmic scaling law between the number and length of the filters, which is reminiscent of discrete wavelet bases.
SDAug 30, 2024
Hold Me Tight: Stable Encoder-Decoder Design for Speech EnhancementDaniel Haider, Felix Perfler, Vincent Lostanlen et al.
Convolutional layers with 1-D filters are often used as frontend to encode audio signals. Unlike fixed time-frequency representations, they can adapt to the local characteristics of input data. However, 1-D filters on raw audio are hard to train and often suffer from instabilities. In this paper, we address these problems with hybrid solutions, i.e., combining theory-driven and data-driven approaches. First, we preprocess the audio signals via a auditory filterbank, guaranteeing good frequency localization for the learned encoder. Second, we use results from frame theory to define an unsupervised learning objective that encourages energy conservation and perfect reconstruction. Third, we adapt mixed compressed spectral norms as learning objectives to the encoder coefficients. Using these solutions in a low-complexity encoder-mask-decoder model significantly improves the perceptual evaluation of speech quality (PESQ) in speech enhancement.
LGJul 8, 2025
Aliasing in Convnets: A Frame-Theoretic PerspectiveDaniel Haider, Vincent Lostanlen, Martin Ehler et al.
Using a stride in a convolutional layer inherently introduces aliasing, which has implications for numerical stability and statistical generalization. While techniques such as the parametrizations via paraunitary systems have been used to promote orthogonal convolution and thus ensure Parseval stability, a general analysis of aliasing and its effects on the stability has not been done in this context. In this article, we adapt a frame-theoretic approach to describe aliasing in convolutional layers with 1D kernels, leading to practical estimates for stability bounds and characterizations of Parseval stability, that are tailored to take short kernel sizes into account. From this, we derive two computationally very efficient optimization objectives that promote Parseval stability via systematically suppressing aliasing. Finally, for layers with random kernels, we derive closed-form expressions for the expected value and variance of the terms that describe the aliasing effects, revealing fundamental insights into the aliasing behavior at initialization.
LGJun 22, 2024
Injectivity of ReLU-layers: Tools from Frame TheoryDaniel Haider, Martin Ehler, Peter Balazs
Injectivity is the defining property of a mapping that ensures no information is lost and any input can be perfectly reconstructed from its output. By performing hard thresholding, the ReLU function naturally interferes with this property, making the injectivity analysis of ReLU layers in neural networks a challenging yet intriguing task that has not yet been fully solved. This article establishes a frame theoretic perspective to approach this problem. The main objective is to develop a comprehensive characterization of the injectivity behavior of ReLU layers in terms of all three involved ingredients: (i) the weights, (ii) the bias, and (iii) the domain where the data is drawn from. Maintaining a focus on practical applications, we limit our attention to bounded domains and present two methods for numerically approximating a maximal bias for given weights and data domains. These methods provide sufficient conditions for the injectivity of a ReLU layer on those domains and yield a novel practical methodology for studying the information loss in ReLU layers. Finally, we derive explicit reconstruction formulas based on the duality concept from frame theory.
IVSep 13, 2021
Blood vessel segmentation in en-face OCTA images: a frequency based methodAnna Breger, Felix Goldbach, Bianca S. Gerendas et al.
Optical coherence tomography angiography (OCTA) is a novel noninvasive imaging modality for visualization of retinal blood flow in the human retina. Using specific OCTA imaging biomarkers for the identification of pathologies, automated image segmentations of the blood vessels can improve subsequent analysis and diagnosis. We present a novel segmentation method for vessel density identification based on frequency representations of the image, in particular, using so-called Gabor filter banks. The algorithm is evaluated qualitatively and quantitatively on an OCTA image in-house data set from $10$ eyes acquired by a Cirrus HD-OCT device. Qualitatively, the segmentation outcomes received very good visual evaluation feedback by experts. Quantitatively, we compared the resulting vessel density values with automated in-built values provided by the device. The results underline the visual evaluation. For the evaluation of the FAZ identification substep, manual annotations of $2$ expert graders were used, showing that our results coincide well in visual and quantitative manners. Lastly, we suggest the computation of adaptive local vessel density maps that allow straightforward analysis of retinal blood flow in a local manner.
IVAug 2, 2019
An amplified-target loss approach for photoreceptor layer segmentation in pathological OCT scansJosé Ignacio Orlando, Anna Breger, Hrvoje Bogunović et al.
Segmenting anatomical structures such as the photoreceptor layer in retinal optical coherence tomography (OCT) scans is challenging in pathological scenarios. Supervised deep learning models trained with standard loss functions are usually able to characterize only the most common disease appeareance from a training set, resulting in suboptimal performance and poor generalization when dealing with unseen lesions. In this paper we propose to overcome this limitation by means of an augmented target loss function framework. We introduce a novel amplified-target loss that explicitly penalizes errors within the central area of the input images, based on the observation that most of the challenging disease appeareance is usually located in this area. We experimentally validated our approach using a data set with OCT scans of patients with macular diseases. We observe increased performance compared to the models that use only the standard losses. Our proposed loss function strongly supports the segmentation model to better distinguish photoreceptors in highly pathological scenarios.
NAJan 22, 2019
On orthogonal projections for dimension reduction and applications in augmented target loss functions for learning problemsAnna Breger, Jose Ignacio Orlando, Pavol Harar et al.
The use of orthogonal projections on high-dimensional input and target data in learning frameworks is studied. First, we investigate the relations between two standard objectives in dimension reduction, preservation of variance and of pairwise relative distances. Investigations of their asymptotic correlation as well as numerical experiments show that a projection does usually not satisfy both objectives at once. In a standard classification problem we determine projections on the input data that balance the objectives and compare subsequent results. Next, we extend our application of orthogonal projections to deep learning tasks and introduce a general framework of augmented target loss functions. These loss functions integrate additional information via transformations and projections of the target data. In two supervised learning problems, clinical image segmentation and music information classification, the application of our proposed augmented target loss functions increase the accuracy.
FAFeb 17, 2014
The Algebraic Approach to Phase Retrieval and Explicit Inversion at the Identifiability ThresholdFranz J Király, Martin Ehler
We study phase retrieval from magnitude measurements of an unknown signal as an algebraic estimation problem. Indeed, phase retrieval from rank-one and more general linear measurements can be treated in an algebraic way. It is verified that a certain number of generic rank-one or generic linear measurements are sufficient to enable signal reconstruction for generic signals, and slightly more generic measurements yield reconstructability for all signals. Our results solve a few open problems stated in the recent literature. Furthermore, we show how the algebraic estimation problem can be solved by a closed-form algebraic estimation technique, termed ideal regression, providing non-asymptotic success guarantees.