Jared Tanner

LG
h-index5
26papers
180citations
Novelty51%
AI Score50

26 Papers

AIOct 13, 2023Code
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

Yuxin Zhang, Lirui Zhao, Mingbao Lin et al.

The ever-increasing large language models (LLMs), though opening a potential path for the upcoming artificial general intelligence, sadly drops a daunting obstacle on the way towards their on-device deployment. As one of the most well-established pre-LLMs approaches in reducing model complexity, network pruning appears to lag behind in the era of LLMs, due mostly to its costly fine-tuning (or re-training) necessity under the massive volumes of model parameter and training data. To close this industry-academia gap, we introduce Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach that slightly updates sparse LLMs without the expensive backpropagation and any weight updates. Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs, in the fashion of performing iterative weight pruning-and-growing on top of sparse LLMs. To accomplish this purpose, DSnoT particularly takes into account the anticipated reduction in reconstruction error for pruning and growing, as well as the variance w.r.t. different input data for growing each weight. This practice can be executed efficiently in linear time since its obviates the need of backpropagation for fine-tuning LLMs. Extensive experiments on LLaMA-V1/V2, Vicuna, and OPT across various benchmarks demonstrate the effectiveness of DSnoT in enhancing the performance of sparse LLMs, especially at high sparsity levels. For instance, DSnoT is able to outperform the state-of-the-art Wanda by 26.79 perplexity at 70% sparsity with LLaMA-7B. Our paper offers fresh insights into how to fine-tune sparse LLMs in an efficient training-free manner and open new venues to scale the great potential of sparsity to LLMs. Codes are available at https://github.com/zyxxmu/DSnoT.

58.6LGMay 27
Model Merging by Output-Space Projection

Bethan Evans, Benjamin Etheridge, Stephen Roberts et al.

Model merging combines fine-tuned checkpoints into a single multi-task model without retraining. Existing methods - such as task arithmetic, model soups, TIES, and DARE - are computationally efficient and empirically successful, but rely on heuristic design choices and lack formal optimality guarantees. We show that merging can be formulated as a convex quadratic programme over residual updates, yielding weights that minimise a squared-output calibration objective using calibration inputs and fine-tuned model outputs, and subsuming existing methods as special cases. Our framework yields a closed-form diagnostic - the fraction of residual energy captured by a chosen basis - that predicts downstream merge quality using only the calibration set. Empirically, the QP matches or outperforms existing methods in the single-layer setting, and we characterise when the optimal basis provides significant gains over the cheaper diagonal QP. We extend to multi-layer merging via a sequential layer-wise algorithm and demonstrate consistent gains across language and vision benchmarks.

ITMar 19, 2010
Improved Bounds on Restricted Isometry Constants for Gaussian Matrices

Bubacarr Bah, Jared Tanner

The Restricted Isometry Constants (RIC) of a matrix $A$ measures how close to an isometry is the action of $A$ on vectors with few nonzero entries, measured in the $\ell^2$ norm. Specifically, the upper and lower RIC of a matrix $A$ of size $n\times N$ is the maximum and the minimum deviation from unity (one) of the largest and smallest, respectively, square of singular values of all ${N\choose k}$ matrices formed by taking $k$ columns from $A$. Calculation of the RIC is intractable for most matrices due to its combinatorial nature; however, many random matrices typically have bounded RIC in some range of problem sizes $(k,n,N)$. We provide the best known bound on the RIC for Gaussian matrices, which is also the smallest known bound on the RIC for any large rectangular matrix. Improvements over prior bounds are achieved by exploiting similarity of singular values for matrices which share a substantial number of columns.

ITJul 15, 2013
Vanishingly Sparse Matrices and Expander Graphs, With Application to Compressed Sensing

Bubacarr Bah, Jared Tanner

We revisit the probabilistic construction of sparse random matrices where each column has a fixed number of nonzeros whose row indices are drawn uniformly at random with replacement. These matrices have a one-to-one correspondence with the adjacency matrices of fixed left degree expander graphs. We present formulae for the expected cardinality of the set of neighbors for these graphs, and present tail bounds on the probability that this cardinality will be less than the expected value. Deducible from these bounds are similar bounds for the expansion of the graph which is of interest in many applications. These bounds are derived through a more detailed analysis of collisions in unions of sets. Key to this analysis is a novel {\em dyadic splitting} technique. The analysis led to the derivation of better order constants that allow for quantitative theorems on existence of lossless expander graphs and hence the sparse random matrices we consider and also quantitative compressed sensing sampling theorems when using sparse non mean-zero measurement matrices.

NAJul 15, 2013
Bounds of restricted isometry constants in extreme asymptotics: formulae for Gaussian matrices

Bubacarr Bah, Jared Tanner

Restricted Isometry Constants (RICs) provide a measure of how far from an isometry a matrix can be when acting on sparse vectors. This, and related quantities, provide a mechanism by which standard eigen-analysis can be applied to topics relying on sparsity. RIC bounds have been presented for a variety of random matrices and matrix dimension and sparsity ranges. We provide explicitly formulae for RIC bounds, of n by N Gaussian matrices with sparsity k, in three settings: a) n/N fixed and k/n approaching zero, b) k/n fixed and n/N approaching zero, and c) n/N approaching zero with k/n decaying inverse logrithmically in N/n; in these three settings the RICs a) decay to zero, b) become unbounded (or approach inherent bounds), and c) approach a non-zero constant. Implications of these results for RIC based analysis of compressed sensing algorithms are presented.

NAApr 28, 2017
A robust parallel algorithm for combinatorial compressed sensing

Rodrigo Mendoza-Smith, Jared Tanner, Florian Wechsung

In previous work two of the authors have shown that a vector $x \in \mathbb{R}^n$ with at most $k < n$ nonzeros can be recovered from an expander sketch $Ax$ in $\mathcal{O}(\mathrm{nnz}(A)\log k)$ operations via the Parallel-$\ell_0$ decoding algorithm, where $\mathrm{nnz}(A)$ denotes the number of nonzero entries in $A \in \mathbb{R}^{m \times n}$. In this paper we present the Robust-$\ell_0$ decoding algorithm, which robustifies Parallel-$\ell_0$ when the sketch $Ax$ is corrupted by additive noise. This robustness is achieved by approximating the asymptotic posterior distribution of values in the sketch given its corrupted measurements. We provide analytic expressions that approximate these posteriors under the assumptions that the nonzero entries in the signal and the noise are drawn from continuous distributions. Numerical experiments presented show that Robust-$\ell_0$ is superior to existing greedy and combinatorial compressed sensing algorithms in the presence of small to moderate signal-to-noise ratios in the setting of Gaussian signals and Gaussian additive noise.

LGOct 27, 2022
Improved Projection Learning for Lower Dimensional Feature Maps

Ilan Price, Jared Tanner

The requirement to repeatedly move large feature maps off- and on-chip during inference with convolutional neural networks (CNNs) imposes high costs in terms of both energy and time. In this work we explore an improved method for compressing all feature maps of pre-trained CNNs to below a specified limit. This is done by means of learned projections trained via end-to-end finetuning, which can then be folded and fused into the pre-trained network. We also introduce a new `ceiling compression' framework in which evaluate such techniques in view of the future goal of performing inference fully on-chip.

NAMar 8, 2022
Tuning-free multi-coil compressed sensing MRI with Parallel Variable Density Approximate Message Passing (P-VDAMP)

Charles Millard, Mark Chiew, Jared Tanner et al.

Magnetic Resonance Imaging (MRI) has excellent soft tissue contrast but is hindered by an inherently slow data acquisition process. Compressed sensing, which reconstructs sparse signals from incoherently sampled data, has been widely applied to accelerate MRI acquisitions. Compressed sensing MRI requires one or more model parameters to be tuned, which is usually done by hand, giving sub-optimal tuning in general. To address this issue, we build on previous work by the authors on the single-coil Variable Density Approximate Message Passing (VDAMP) algorithm, extending the framework to multiple receiver coils to propose the Parallel VDAMP (P-VDAMP) algorithm. For Bernoulli random variable density sampling, P-VDAMP obeys a "state evolution", where the intermediate per-iteration image estimate is distributed according to the ground truth corrupted by a zero-mean Gaussian vector with approximately known covariance. To our knowledge, P-VDAMP is the first algorithm for multi-coil MRI data that obeys a state evolution with accurately tracked parameters. We leverage state evolution to automatically tune sparse parameters on-the-fly with Stein's Unbiased Risk Estimate (SURE). P-VDAMP is evaluated on brain, knee and angiogram datasets and compared with four variants of the Fast Iterative Shrinkage-Thresholding algorithm (FISTA), including two tuning-free variants from the literature. The proposed method is found to have a similar reconstruction quality and time to convergence as FISTA with an optimally tuned sparse weighting and offers substantial robustness and reconstruction quality improvements over competing tuning-free methods.

MLJan 31, 2023
On the Initialisation of Wide Low-Rank Feedforward Neural Networks

Thiziri Nait Saada, Jared Tanner

The edge-of-chaos dynamics of wide randomly initialized low-rank feedforward networks are analyzed. Formulae for the optimal weight and bias variances are extended from the full-rank to low-rank setting and are shown to follow from multiplicative scaling. The principle second order effect, the variance of the input-output Jacobian, is derived and shown to increase as the rank to width ratio decreases. These results inform practitioners how to randomly initialize feedforward networks with a reduced number of learnable parameters while in the same ambient dimension, allowing reductions in the computational cost and memory constraints of the associated network.

LGJan 30, 2023
Optimal Approximation Complexity of High-Dimensional Functions with Neural Networks

Vincent P. H. Goverse, Jad Hamdan, Jared Tanner

We investigate properties of neural networks that use both ReLU and $x^2$ as activation functions and build upon previous results to show that both analytic functions and functions in Sobolev spaces can be approximated by such networks of constant depth to arbitrary accuracy, demonstrating optimal order approximation rates across all nonlinear approximators, including standard ReLU networks. We then show how to leverage low local dimensionality in some contexts to overcome the curse of dimensionality, obtaining approximation rates that are optimal for unknown lower-dimensional subspaces.

MLOct 25, 2023
Beyond IID weights: sparse and low-rank deep Neural Networks are also Gaussian Processes

Thiziri Nait-Saada, Alireza Naderi, Jared Tanner

The infinitely wide neural network has been proven a useful and manageable mathematical model that enables the understanding of many phenomena appearing in deep learning. One example is the convergence of random deep networks to Gaussian processes that allows a rigorous analysis of the way the choice of activation function and network weights impacts the training dynamics. In this paper, we extend the seminal proof of Matthews et al. (2018) to a larger class of initial weight distributions (which we call PSEUDO-IID), including the established cases of IID and orthogonal weights, as well as the emerging low-rank and structured sparse settings celebrated for their computational speed-up benefits. We show that fully-connected and convolutional networks initialized with PSEUDO-IID distributions are all effectively equivalent up to their variance. Using our results, one can identify the Edge-of-Chaos for a broader class of neural networks and tune them at criticality in order to enhance their training. Moreover, they enable the posterior distribution of Bayesian Neural Networks to be tractable across these various initialization schemes.

LGJan 23
Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks

Bethan Evans, Jared Tanner

The minimal norm weight perturbations of DNNs required to achieve a specified change in output are derived and the factors determining its size are discussed. These single-layer exact formulae are contrasted with more generic multi-layer Lipschitz constant based robustness guarantees; both are observed to be of the same order which indicates similar efficacy in their guarantees. These results are applied to precision-modification-activated backdoor attacks, establishing provable compression thresholds below which such attacks cannot succeed, and show empirically that low-rank compression can reliably activate latent backdoors while preserving full-precision accuracy. These expressions reveal how back-propagated margins govern layer-wise sensitivity and provide certifiable guarantees on the smallest parameter updates consistent with a desired output shift.

LGFeb 5
How Controlling the Variance can Improve Training Stability of Sparsely Activated DNNs and CNNs

Emily Dent, Jared Tanner

The intermediate layers of deep networks can be characterised as a Gaussian process, in particular the Edge-of-Chaos (EoC) initialisation strategy prescribes the limiting covariance matrix of the Gaussian process. Here we show that the under-utilised chosen variance of the Gaussian process is important in the training of deep networks with sparsity inducing activation, such as a shifted and clipped ReLU, $\text{CReLU}_{τ,m}(x)=\min(\max(x-τ,0),m)$. Specifically, initialisations leading to larger fixed Gaussian process variances, allow for improved expressivity with activation sparsity as large as 90% in DNNs and CNNs, and generally improve the stability of the training process. Enabling full, or near full, accuracy at such high levels of sparsity in the hidden layers suggests a promising mechanism to reduce the energy consumption of machine learning models involving fully connected layers.

LGFeb 25, 2024
Deep Neural Network Initialization with Sparsity Inducing Activations

Ilan Price, Nicholas Daultry Ball, Samuel C. H. Lam et al.

Inducing and leveraging sparse activations during training and inference is a promising avenue for improving the computational efficiency of deep networks, which is increasingly important as network sizes continue to grow and their application becomes more widespread. Here we use the large width Gaussian process limit to analyze the behaviour, at random initialization, of nonlinear activations that induce sparsity in the hidden outputs. A previously unreported form of training instability is proven for arguably two of the most natural candidates for hidden layer sparsification; those being a shifted ReLU ($φ(x)=\max(0, x-τ)$ for $τ\ge 0$) and soft thresholding ($φ(x)=0$ for $|x|\leτ$ and $x-\text{sign}(x)τ$ for $|x|>τ$). We show that this instability is overcome by clipping the nonlinear activation magnitude, at a level prescribed by the shape of the associated Gaussian process variance map. Numerical experiments verify the theory and show that the proposed magnitude clipped sparsifying activations can be trained with training and test fractional sparsity as high as 85\% while retaining close to full accuracy.

LGMay 17, 2021
Activation function design for deep networks: linearity and effective initialisation

Michael Murray, Vinayak Abrol, Jared Tanner

The activation function deployed in a deep neural network has great influence on the performance of the network at initialisation, which in turn has implications for training. In this paper we study how to avoid two problems at initialisation identified in prior works: rapid convergence of pairwise input correlations, and vanishing and exploding gradients. We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin, relative to the bias variance $σ_b^2$ of the network's random initialisation. We demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms test and training accuracy as well as training time. Furthermore, we observe that the shape of the nonlinear activation outside the linear region appears to have a relatively limited impact on training. Finally, our results also allow us to train networks in a new hyperparameter regime, with a much larger bias variance than has previously been possible.

LGFeb 12, 2021
Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset

Ilan Price, Jared Tanner

That neural networks may be pruned to high sparsities and retain high accuracy is well established. Recent research efforts focus on pruning immediately after initialization so as to allow the computational savings afforded by sparsity to extend to the training process. In this work, we introduce a new `DCT plus Sparse' layer architecture, which maintains information propagation and trainability even with as little as 0.01% trainable kernel parameters remaining. We show that standard training of networks built with these layers, and pruned at initialization, achieves state-of-the-art accuracy for extreme sparsities on a variety of benchmark network architectures and datasets. Moreover, these results are achieved using only simple heuristics to determine the locations of the trainable parameters in the network, and thus without having to initially store or compute with the full, unpruned network, as is required by competing prune-at-initialization algorithms. Switching from standard sparse layers to DCT plus Sparse layers does not increase the storage footprint of a network and incurs only a small additional computational overhead.

LGDec 3, 2020
An Empirical Study of Derivative-Free-Optimization Algorithms for Targeted Black-Box Attacks in Deep Neural Networks

Giuseppe Ughi, Vinayak Abrol, Jared Tanner

We perform a comprehensive study on the performance of derivative free optimization (DFO) algorithms for the generation of targeted black-box adversarial attacks on Deep Neural Network (DNN) classifiers assuming the perturbation energy is bounded by an $\ell_\infty$ constraint and the number of queries to the network is limited. This paper considers four pre-existing state-of-the-art DFO-based algorithms along with the introduction of a new algorithm built on BOBYQA, a model-based DFO method. We compare these algorithms in a variety of settings according to the fraction of images that they successfully misclassify given a maximum number of queries to the DNN. The experiments disclose how the likelihood of finding an adversarial example depends on both the algorithm used and the setting of the attack; algorithms limiting the search of adversarial example to the vertices of the $\ell^\infty$ constraint work particularly well without structural defenses, while the presented BOBYQA based algorithm works better for especially small perturbation energies. This variance in performance highlights the importance of new algorithms being compared to the state-of-the-art in a variety of settings, and the effectiveness of adversarial defenses being tested using as wide a range of algorithms as possible.

NAJul 18, 2020
Compressed sensing of low-rank plus sparse matrices

Jared Tanner, Simon Vary

Expressing a matrix as the sum of a low-rank matrix plus a sparse matrix is a flexible model capturing global and local features in data popularized as Robust PCA (Candes et al., 2011; Chandrasekaran et al., 2009). Compressed sensing, matrix completion, and their variants (Eldar and Kutyniok, 2012; Foucart and Rauhut, 2013) have established that data satisfying low complexity models can be efficiently measured and recovered from a number of measurements proportional to the model complexity rather than the ambient dimension. This manuscript develops similar guarantees showing that $m\times n$ matrices that can be expressed as the sum of a rank-$r$ matrix and a $s$-sparse matrix can be recovered by computationally tractable methods from $\mathcal{O}(r(m+n-r)+s)\log(mn/s)$ linear measurements. More specifically, we establish that the low-rank plus sparse matrix set is closed provided the incoherence of the low-rank component is upper bounded as $μ<\sqrt{mn}/(r\sqrt{s})$, and subsequently, the restricted isometry constants for the aforementioned matrices remain bounded independent of problem size provided $p/mn$, $s/p$, and $r(m+n-r)/p$ remain fixed. Additionally, we show that semidefinite programming and two hard threshold gradient descent algorithms, NIHT and NAHT, converge to the measured matrix provided the measurement operator's RIC's are sufficiently small. These results also provably solve convex and non-convex formulation of Robust PCA with the asymptotically optimal fraction of corruptions $α=\mathcal{O}\left(1/(μr) \right)$, where $s = α^2 mn$, and improve the previously best known guarantees by not requiring that the fraction of corruptions is spread in every column and row by being upper bounded by $α$. Numerical experiments illustrating these results are shown for synthetic problems, dynamic-foreground/static-background separation, and multispectral imaging.

LGApr 10, 2020
Encoder blind combinatorial compressed sensing

Michael Murray, Jared Tanner

In its most elementary form, compressed sensing studies the design of decoding algorithms to recover a sufficiently sparse vector or code from a lower dimensional linear measurement vector. Typically it is assumed that the decoder has access to the encoder matrix, which in the combinatorial case is sparse and binary. In this paper we consider the problem of designing a decoder to recover a set of sparse codes from their linear measurements alone, that is without access to encoder matrix. To this end we study the matrix factorisation task of recovering both the encoder and sparse coding matrices from the associated linear measurement matrix. The contribution of this paper is a computationally efficient decoding algorithm, Decoder-Expander Based Factorisation, with strong performance guarantees. In particular, under mild assumptions on the sparse coding matrix and by deploying a novel random encoder matrix, we prove that Decoder-Expander Based Factorisation recovers both the encoder and sparse coding matrix at the optimal measurement rate with high probability and from a near optimal number of measurement vectors. In addition, our experiments demonstrate the efficacy and computational efficiency of our algorithm in practice. Beyond compressed sensing our results may be of interest for researchers working in areas such as linear sketching, coding theory and matrix compression.

LGFeb 24, 2020
A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA

Giuseppe Ughi, Vinayak Abrol, Jared Tanner

We demonstrate that model-based derivative free optimisation algorithms can generate adversarial targeted misclassification of deep networks using fewer network queries than non-model-based methods. Specifically, we consider the black-box setting, and show that the number of networks queries is less impacted by making the task more challenging either through reducing the allowed $\ell^{\infty}$ perturbation energy or training the network with defences against adversarial misclassification. We illustrate this by contrasting the BOBYQA algorithm with the state-of-the-art model-free adversarial targeted misclassification approaches based on genetic, combinatorial, and direct-search algorithms. We observe that for high $\ell^{\infty}$ energy perturbations on networks, the aforementioned simpler model-free methods require the fewest queries. In contrast, the proposed BOBYQA based method achieves state-of-the-art results when the perturbation energy decreases, or if the network is trained against adversarial perturbations.

MLNov 25, 2019
Trajectory growth lower bounds for random sparse deep ReLU networks

Ilan Price, Jared Tanner

This paper considers the growth in the length of one-dimensional trajectories as they are passed through deep ReLU neural networks, which, among other things, is one measure of the expressivity of deep networks. We generalise existing results, providing an alternative, simpler method for lower bounding expected trajectory growth through random networks, for a more general class of weights distributions, including sparsely connected networks. We illustrate this approach by deriving bounds for sparse-Gaussian, sparse-uniform, and sparse-discrete-valued random nets. We prove that trajectory growth can remain exponential in depth with these new distributions, including their sparse variants, with the sparsity parameter appearing in the base of the exponent.

CVFeb 28, 2019
Multispectral snapshot demosaicing via non-convex matrix completion

Giancarlo A. Antonucci, Simon Vary, David Humphreys et al.

Snapshot mosaic multispectral imagery acquires an undersampled data cube by acquiring a single spectral measurement per spatial pixel. Sensors which acquire $p$ frequencies, therefore, suffer from severe $1/p$ undersampling of the full data cube. We show that the missing entries can be accurately imputed using non-convex techniques from sparse approximation and matrix completion initialised with traditional demosaicing algorithms. In particular, we observe the peak signal-to-noise ratio can typically be improved by 2 to 5 dB over current state-of-the-art methods when simulating a $p=16$ mosaic sensor measuring both high and low altitude urban and rural scenes as well as ground-based scenes.

LGJun 26, 2018
Towards an understanding of CNNs: analysing the recovery of activation pathways via Deep Convolutional Sparse Coding

Michael Murray, Jared Tanner

Deep Convolutional Sparse Coding (D-CSC) is a framework reminiscent of deep convolutional neural networks (DCNNs), but by omitting the learning of the dictionaries one can more transparently analyse the role of the activation function and its ability to recover activation paths through the layers. Papyan, Romano, and Elad conducted an analysis of such an architecture, demonstrated the relationship with DCNNs and proved conditions under which the D-CSC is guaranteed to recover specific activation paths. A technical innovation of their work highlights that one can view the efficacy of the ReLU nonlinear activation function of a DCNN through a new variant of the tensor's sparsity, referred to as stripe-sparsity. Using this they proved that representations with an activation density proportional to the ambient dimension of the data are recoverable. We extend their uniform guarantees to a modified model and prove that with high probability the true activation is typically possible to recover for a greater density of activations per layer. Our extension follows from incorporating the prior work on one step thresholding by Schnass and Vandergheynst.

NAAug 6, 2015
Expander $\ell_0$-Decoding

Rodrigo Mendoza-Smith, Jared Tanner

We introduce two new algorithms, Serial-$\ell_0$ and Parallel-$\ell_0$ for solving a large underdetermined linear system of equations $y = Ax \in \mathbb{R}^m$ when it is known that $x \in \mathbb{R}^n$ has at most $k < m$ nonzero entries and that $A$ is the adjacency matrix of an unbalanced left $d$-regular expander graph. The matrices in this class are sparse and allow a highly efficient implementation. A number of algorithms have been designed to work exclusively under this setting, composing the branch of combinatorial compressed-sensing (CCS). Serial-$\ell_0$ and Parallel-$\ell_0$ iteratively minimise $\|y - A\hat x\|_0$ by successfully combining two desirable features of previous CCS algorithms: the information-preserving strategy of ER, and the parallel updating mechanism of SMP. We are able to link these elements and guarantee convergence in $\mathcal{O}(dn \log k)$ operations by assuming that the signal is dissociated, meaning that all of the $2^k$ subset sums of the support of $x$ are pairwise different. However, we observe empirically that the signal need not be exactly dissociated in practice. Moreover, we observe Serial-$\ell_0$ and Parallel-$\ell_0$ to be able to solve large scale problems with a larger fraction of nonzeros than other algorithms when the number of measurements is substantially less than the signal length; in particular, they are able to reliably solve for a $k$-sparse vector $x\in\mathbb{R}^n$ from $m$ expander measurements with $n/m=10^3$ and $k/m$ up to four times greater than what is achievable by $\ell_1$-regularization from dense Gaussian measurements. Additionally, Serial-$\ell_0$ and Parallel-$\ell_0$ are observed to be able to solve large problems sizes in substantially less time than other algorithms for compressed sensing. In particular, Parallel-$\ell_0$ is structured to take advantage of massively parallel architectures.

NAApr 22, 2015
Identification of Matrices having a Sparse Representation

Götz E. Pfander, Holger Rauhut, Jared Tanner

We consider the problem of recovering a matrix from its action on a known vector in the setting where the matrix can be represented efficiently in a known matrix dictionary. Connections with sparse signal recovery allows for the use of efficient reconstruction techniques such as Basis Pursuit. Of particular interest is the dictionary of time-frequency shift matrices and its role for channel estimation and identification in communications engineering. We present recovery results for Basis Pursuit with the time-frequency shift dictionary and various dictionaries of random matrices.

MGSep 26, 2006
Counting faces of randomly-projected polytopes when the projection radically lowers dimension

David L. Donoho, Jared Tanner

This paper develops asymptotic methods to count faces of random high-dimensional polytopes. Beyond its intrinsic interest, our conclusions have surprising implications - in statistics, probability, information theory, and signal processing - with potential impacts in practical subjects like medical imaging and digital communications. Three such implications concern: convex hulls of Gaussian point clouds, signal recovery from random projections, and how many gross errors can be efficiently corrected from Gaussian error correcting codes.