Clarice Poon

LG
h-index6
17papers
774citations
Novelty53%
AI Score47

17 Papers

ITJun 23, 2014
Breaking the coherence barrier: A new theory for compressed sensing

Ben Adcock, Anders C. Hansen, Clarice Poon et al.

This paper provides an extension of compressed sensing which bridges a substantial gap between existing theory and its current use in real-world applications. It introduces a mathematical framework that generalizes the three standard pillars of compressed sensing - namely, sparsity, incoherence and uniform random subsampling - to three new concepts: asymptotic sparsity, asymptotic incoherence and multilevel random sampling. The new theorems show that compressed sensing is also possible, and reveals several advantages, under these substantially relaxed conditions. The importance of this is threefold. First, inverse problems to which compressed sensing is currently applied are typically coherent. The new theory provides the first comprehensive mathematical explanation for a range of empirical usages of compressed sensing in real-world applications, such as medical imaging, microscopy, spectroscopy and others. Second, in showing that compressed sensing does not require incoherence, but instead that asymptotic incoherence is sufficient, the new theory offers markedly greater flexibility in the design of sensing mechanisms. Third, by using asymptotic incoherence and multi-level sampling to exploit not just sparsity, but also structure, i.e. asymptotic sparsity, the new theory shows that substantially improved reconstructions can be obtained from fewer measurements.

NAJan 13, 2013
Beyond consistent reconstructions: optimality and sharp bounds for generalized sampling, and application to the uniform resampling problem

Ben Adcock, Anders C. Hansen, Clarice Poon

Generalized sampling is a recently developed linear framework for sampling and reconstruction in separable Hilbert spaces. It allows one to recover any element in any finite-dimensional subspace given finitely many of its samples with respect to an arbitrary frame. Unlike more common approaches for this problem, such as the consistent reconstruction technique of Eldar et al, it leads to completely stable numerical methods possessing both guaranteed stability and accuracy. The purpose of this paper is twofold. First, we give a complete and formal analysis of generalized sampling, the main result of which being the derivation of new, sharp bounds for the accuracy and stability of this approach. Such bounds improve those given previously, and result in a necessary and sufficient condition, the stable sampling rate, which guarantees a priori a good reconstruction. Second, we address the topic of optimality. Under some assumptions, we show that generalized sampling is an optimal, stable reconstruction. Correspondingly, whenever these assumptions hold, the stable sampling rate is a universal quantity. In the final part of the paper we illustrate our results by applying generalized sampling to the so-called uniform resampling problem.

NAMay 12, 2013
On optimal wavelet reconstructions from Fourier samples: linearity and universality of the stable sampling rate

Ben Adcock, Anders C. Hansen, Clarice Poon

In this paper we study the problem of computing wavelet coefficients of compactly supported functions from their Fourier samples. For this, we use the recently introduced framework of generalized sampling. Our first result demonstrates that using generalized sampling one obtains a stable and accurate reconstruction, provided the number of Fourier samples grows linearly in the number of wavelet coefficients recovered. For the class of Daubechies wavelets we derive the exact constant of proportionality. Our second result concerns the optimality of generalized sampling for this problem. Under some mild assumptions we show that generalized sampling cannot be outperformed in terms of approximation quality by more than a constant factor. Moreover, for the class of so-called perfect methods, any attempt to lower the sampling ratio below a certain critical threshold necessarily results in exponential ill-conditioning. Thus generalized sampling provides a nearly-optimal solution to this problem.

NAMay 4
Hadamard Langevin dynamics for sampling the l1-prior

Ivan Cheltsov, Federico Cornalba, Clarice Poon et al.

Priors with non-smooth log-densities, such as the l1-prior, are widely used in Bayesian inverse problems for their sparsity-inducing properties. Existing Langevin-based sampling methods typically rely on proximal mappings or smooth approximations, which alter the target distribution. We propose an alternative approach based on a Hadamard product parameterization of the l1-norm, leading to a smooth but nonconvex and non-globally Lipschitz potential whose marginal law exactly recovers the desired posterior. The resulting Hadamard Langevin dynamics (HLD) defines a diffusion process that is analytically distinct from proximal or mirror-type Langevin schemes. Our main contribution is a rigorous well-posedness theory for both the continuous and discrete HLD. We establish existence and uniqueness of strong solutions, geometric ergodicity of the continuous dynamics, and convergence of the discretized scheme as the step size tends to zero. These results provide the first theoretical foundation for sampling from nonconvex, nonsmooth posteriors through overparameterized Langevin dynamics.

NAMar 6, 2016
A practical guide to the recovery of wavelet coefficients from Fourier measurements

Milana Gataric, Clarice Poon

In a series of recent papers (Adcock, Hansen and Poon, 2013, Appl. Comput. Harm. Anal. 45(5):3132-3167), (Adcock, Gataric and Hansen, 2014, SIAM J. Imaging Sci. 7(3):1690-1723) and (Adcock, Hansen, Kutyniok and Ma, 2015, SIAM J. Math. Anal. 47(2):1196-1233), it was shown that one can optimally recover the wavelet coefficients of an unknown compactly supported function from pointwise evaluations of its Fourier transform via the method of generalized sampling. While these papers focused on the optimality of generalized sampling in terms of its stability and error bounds, the current paper explains how this optimal method can be implemented to yield a computationally efficient algorithm. In particular, we show that generalized sampling has a computational complexity of $\mathcal{O}(M(N)\log N)$ when recovering the first $N$ boundary-corrected wavelet coefficients of an unknown compactly supported function from $M(N)$ Fourier samples. Therefore, due to the linear correspondences between the number of samples $M$ and number of coefficients $N$ shown previously, generalized sampling offers a computationally optimal way of recovering wavelet coefficients from Fourier data.

OCMay 3, 2022
Smooth over-parameterized solvers for non-smooth structured optimization

Clarice Poon, Gabriel Peyré

Non-smooth optimization is a core ingredient of many imaging or machine learning pipelines. Non-smoothness encodes structural constraints on the solutions, such as sparsity, group sparsity, low-rank and sharp edges. It is also the basis for the definition of robust loss functions and scale-free functionals such as square-root Lasso. Standard approaches to deal with non-smoothness leverage either proximal splitting or coordinate descent. These approaches are effective but usually require parameter tuning, preconditioning or some sort of support pruning. In this work, we advocate and study a different route, which operates a non-convex but smooth over-parametrization of the underlying non-smooth optimization problems. This generalizes quadratic variational forms that are at the heart of the popular Iterative Reweighted Least Squares (IRLS). Our main theoretical contribution connects gradient descent on this reformulation to a mirror descent flow with a varying Hessian metric. This analysis is crucial to derive convergence bounds that are dimension-free. This explains the efficiency of the method when using small grid sizes in imaging. Our main algorithmic contribution is to apply the Variable Projection (VarPro) method which defines a new formulation by explicitly minimizing over part of the variables. This leads to a better conditioning of the minimized functional and improves the convergence of simple but very efficient gradient-based methods, for instance quasi-Newton solvers. We exemplify the use of this new solver for the resolution of regularized regression problems for inverse problems and supervised learning, including total variation prior and non-convex regularizers.

ITFeb 7, 2016
On Cartesian line sampling with anisotropic total variation regularization

Clarice Poon

This paper considers the use of the anisotropic total variation seminorm to recover a two dimensional vector $x\in \mathbb{C}^{N\times N}$ from its partial Fourier coefficients, sampled along Cartesian lines. We prove that if $(x_{k,j} - x_{k-1,j})_{k,j}$ has at most $s_1$ nonzero coefficients in each column and $(x_{k,j} - x_{k,j-1})_{k,j}$ has at most $s_2$ nonzero coefficients in each row, then, up to multiplication by $\log$ factors, one can exactly recover $x$ by sampling along $s_1$ horizontal lines of its Fourier coefficients and along $s_2$ vertical lines of its Fourier coefficients. Finally, unlike standard compressed sensing estimates, the $\log$ factors involved are dependent on the separation distance between the nonzero entries in each row/column of the gradient of $x$ and not on $N^2$, the ambient dimension of $x$.

OCMay 11
On the global convergence of gradient descent for wide shallow models with bounded nonlinearities

Romain Petit, Clarice Poon, Gabriel Peyré

A surprising phenomenon in the training of neural networks is the ability of gradient descent to find global minimizers of the training loss despite its non-convexity. Following earlier works, we investigate this behavior for wide shallow networks. Existing results essentially cover the case of ReLU activations and the case of sigmoid activations with scalar output weights. We study a large class of models that includes multi-head attention layers and two-layer sigmoid networks with vector output weights. Building upon [Chizat and Bach, 2018], we prove that all non-global minimizers of the training loss are unstable under gradient descent dynamics. Thus, when the initial distribution of the parameters has full support (which includes the popular Gaussian case), and in the many hidden neurons or attention heads limit, continuous-time gradient descent can only converge to global minimizers. Establishing the instability of non-global minimizers corresponds to the construction of an ``escaping active set'' -- we complete the proof of [Chizat and Bach, 2018] to construct this set for models with bounded nonlinearities and scalar output weights. We also extend this construction to new cases for models with vector output weights. Finally, we show the well-posedness and the stability with respect to discretization of the mean field training dynamic for sub-Gaussian initializations.

LGOct 8, 2023
Compressed online Sinkhorn

Fengpei Wang, Clarice Poon, Tony Shardlow

The use of optimal transport (OT) distances, and in particular entropic-regularised OT distances, is an increasingly popular evaluation metric in many areas of machine learning and data science. Their use has largely been driven by the availability of efficient algorithms such as the Sinkhorn algorithm. One of the drawbacks of the Sinkhorn algorithm for large-scale data processing is that it is a two-phase method, where one first draws a large stream of data from the probability distributions, before applying the Sinkhorn algorithm to the discrete probability measures. More recently, there have been several works developing stochastic versions of Sinkhorn that directly handle continuous streams of data. In this work, we revisit the recently introduced online Sinkhorn algorithm of [Mensch and Peyré, 2020]. Our contributions are twofold: We improve the convergence analysis for the online Sinkhorn algorithm, the new rate that we obtain is faster than the previous rate under certain parameter choices. We also present numerical results to verify the sharpness of our result. Secondly, we propose the compressed online Sinkhorn algorithm which combines measure compression techniques with the online Sinkhorn algorithm. We provide numerical experiments to show practical numerical gains, as well as theoretical guarantees on the efficiency of our approach.

LGMay 11, 2025
Learning from Samples: Inverse Problems over measures via Sharpened Fenchel-Young Losses

Francisco Andrade, Gabriel Peyré, Clarice Poon

Estimating parameters from samples of an optimal probability distribution is essential in applications ranging from socio-economic modeling to biological system analysis. In these settings, the probability distribution arises as the solution to an optimization problem that captures either static interactions among agents or the dynamic evolution of a system over time. We introduce a general methodology based on a new class of loss functions, called sharpened Fenchel-Young losses, which measure the sub-optimality gap of the optimization problem over the space of probability measures. We provide explicit stability guarantees for two relevant settings in the context of optimal transport: The first is inverse unbalanced optimal transport (iUOT) with entropic regularization, where the parameters to estimate are cost functions that govern transport computations; this method has applications such as link prediction in machine learning. The second is inverse gradient flow (iJKO), where the objective is to recover a potential function that drives the evolution of a probability distribution via the Jordan-Kinderlehrer-Otto (JKO) time-discretization scheme; this is particularly relevant for understanding cell population dynamics in single-cell genomics. We also establish source conditions to ensure stability of our method under mirror stratifiable regularizers (such as l1 or nuclear norm) that promote structure. Finally, we present optimization algorithms specifically tailored to efficiently solve iUOT and iJKO problems. We validate our approach through numerical experiments on Gaussian distributions, where closed-form solutions are available, to demonstrate the practical performance of our methods.

MLJun 2, 2021
Smooth Bilevel Programming for Sparse Regularization

Clarice Poon, Gabriel Peyré

Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning. State of the art approaches are more efficient but typically rely on specific coordinate pruning schemes. In this work, we show how a surprisingly simple reparametrization of IRLS, coupled with a bilevel resolution (instead of an alternating scheme) is able to achieve top performances on a wide range of sparsity (such as Lasso, group Lasso and trace norm regularizations), regularization strength (including hard constraints), and design matrices (ranging from correlated designs to differential operators). Similarly to IRLS, our method only involves linear systems resolutions, but in sharp contrast, corresponds to the minimization of a smooth function. Despite being non-convex, we show that there is no spurious minima and that saddle points are "ridable", so that there always exists a descent direction. We thus advocate for the use of a BFGS quasi-Newton solver, which makes our approach simple, robust and efficient. We perform a numerical benchmark of the convergence speed of our algorithm against state of the art solvers for Lasso, group Lasso, trace norm and linearly constrained problems. These results highlight the versatility of our approach, removing the need to use different solvers depending on the specificity of the ML problem under study.

LGJan 18, 2021
Screening for Sparse Online Learning

Jingwei Liang, Clarice Poon

Sparsity promoting regularizers are widely used to impose low-complexity structure (e.g. l1-norm for sparsity) to the regression coefficients of supervised learning. In the realm of deterministic optimization, the sequence generated by iterative algorithms (such as proximal gradient descent) exhibit "finite activity identification", namely, they can identify the low-complexity structure in a finite number of iterations. However, most online algorithms (such as proximal stochastic gradient descent) do not have the property owing to the vanishing step-size and non-vanishing variance. In this paper, by combining with a screening rule, we show how to eliminate useless features of the iterates generated by online algorithms, and thereby enforce finite activity identification. One consequence is that when combined with any convergent online algorithm, sparsity properties imposed by the regularizer can be exploited for computational gains. Numerically, significant acceleration can be obtained.

CVNov 23, 2020
An off-the-grid approach to multi-compartment magnetic resonance fingerprinting

Mohammad Golbabaee, Clarice Poon

We propose a novel numerical approach to separate multiple tissue compartments in image voxels and to estimate quantitatively their nuclear magnetic resonance (NMR) properties and mixture fractions, given magnetic resonance fingerprinting (MRF) measurements. The number of tissues, their types or quantitative properties are not a-priori known, but the image is assumed to be composed of sparse compartments with linearly mixed Bloch magnetisation responses within voxels. Fine-grid discretisation of the multi-dimensional NMR properties creates large and highly coherent MRF dictionaries that can challenge scalability and precision of the numerical methods for (discrete) sparse approximation. To overcome these issues, we propose an off-the-grid approach equipped with an extended notion of the sparse group lasso regularisation for sparse approximation using continuous (non-discretised) Bloch response models. Further, the nonlinear and non-analytical Bloch responses are approximated by a neural network, enabling efficient back-propagation of the gradients through the proposed algorithm. Tested on simulated and in-vivo healthy brain MRF data, we demonstrate effectiveness of the proposed scheme compared to the baseline multicompartment MRF methods.

MLNov 8, 2019
Degrees of freedom for off-the-grid sparse estimation

Clarice Poon, Gabriel Peyré

A central question in modern machine learning and imaging sciences is to quantify the number of effective parameters of vastly over-parameterized models. The degrees of freedom is a mathematically convenient way to define this number of parameters. Its computation and properties are well understood when dealing with discretized linear models, possibly regularized using sparsity. In this paper, we argue that this way of thinking is plagued when dealing with models having very large parameter spaces. In this case it makes more sense to consider "off-the-grid" approaches, using a continuous parameter space. This type of approach is the one favoured when training multi-layer perceptrons, and is also becoming popular to solve super-resolution problems in imaging. Training these off-the-grid models with a sparsity inducing prior can be achieved by solving a convex optimization problem over the space of measures, which is often called the Beurling Lasso (Blasso), and is the continuous counterpart of the celebrated Lasso parameter selection method. In previous works, the degrees of freedom for the Lasso was shown to coincide with the size of the smallest solution support. Our main contribution is a proof of a continuous counterpart to this result for the Blasso. Our findings suggest that discretized methods actually vastly over-estimate the number of intrinsic continuous degrees of freedom. Our second contribution is a detailed study of the case of sampling Fourier coefficients in 1D, which corresponds to a super-resolution problem. We show that our formula for the degrees of freedom is valid outside of a set of measure zero of observations, which in turn justifies its use to compute an unbiased estimator of the prediction risk using the Stein Unbiased Risk Estimator (SURE).

CVFeb 14, 2019
On instabilities of deep learning in image reconstruction - Does AI come at a cost?

Vegard Antun, Francesco Renna, Clarice Poon et al.

Deep learning, due to its unprecedented success in tasks such as image classification, has emerged as a new tool in image reconstruction with potential to change the field. In this paper we demonstrate a crucial phenomenon: deep learning typically yields unstablemethods for image reconstruction. The instabilities usually occur in several forms: (1) tiny, almost undetectable perturbations, both in the image and sampling domain, may result in severe artefacts in the reconstruction, (2) a small structural change, for example a tumour, may not be captured in the reconstructed image and (3) (a counterintuitive type of instability) more samples may yield poorer performance. Our new stability test with algorithms and easy to use software detects the instability phenomena. The test is aimed at researchers to test their networks for instabilities and for government agencies, such as the Food and Drug Administration (FDA), to secure safe use of deep learning methods.

NASep 10, 2017
Multi-dimensional Sparse Super-resolution

Clarice Poon, Gabriel Peyré

This paper studies sparse super-resolution in arbitrary dimensions. More precisely, it develops a theoretical analysis of support recovery for the so-called BLASSO method, which is an off-the-grid generalisation of l1 regularization (also known as the LASSO). While super-resolution is of paramount importance in overcoming the limitations of many imaging devices, its theoretical analysis is still lacking beyond the 1-dimensional (1-D) case. The reason is that in the 2-dimensional (2-D) case and beyond, the relative position of the spikes enters the picture, and different geometrical configurations lead to different stability properties. Our first main contribution is a connection, in the limit where the spikes cluster at a given point, between solutions of the dual of the BLASSO problem and Hermite polynomial interpolation ideals. Polynomial bases for these ideals, introduced by De Boor, can be computed by Gaussian elimination, and lead to an algorithmic description of limiting solutions to the dual problem. With this construction at hand, our second main contribution is a detailed analysis of the support stability and super-resolution effect in the case of a pair of spikes. This includes in particular a sharp analysis of how the signal-to-noise ratio should scale with respect to the separation distance between the spikes. Lastly, numerical simulations on different classes of kernels show the applicability of this theory and highlight the richness of super-resolution in 2-D.

NAJun 4, 2017
Sampling the Fourier transform along radial lines

Charles Dossal, Vincent Duval, Clarice Poon

This article considers the use of total variation minimization for the recovery of a superposition of point sources from samples of its Fourier transform along radial lines. We present a numerical algorithm for the computation of solutions to this infinite dimensional problem. The theoretical results of this paper make precise the link between the sampling operator and the recoverability of the point sources.