Rodrigo Mendoza-Smith

4papers

27citations

Novelty52%

AI Score36

Ranked #121,503 of 205,806 authors (top 59%)#26,506 in LG (top 63%)

4 Papers

NAApr 28, 2017

A robust parallel algorithm for combinatorial compressed sensing

Rodrigo Mendoza-Smith, Jared Tanner, Florian Wechsung

In previous work two of the authors have shown that a vector $x \in \mathbb{R}^n$ with at most $k < n$ nonzeros can be recovered from an expander sketch $Ax$ in $\mathcal{O}(\mathrm{nnz}(A)\log k)$ operations via the Parallel-$\ell_0$ decoding algorithm, where $\mathrm{nnz}(A)$ denotes the number of nonzero entries in $A \in \mathbb{R}^{m \times n}$. In this paper we present the Robust-$\ell_0$ decoding algorithm, which robustifies Parallel-$\ell_0$ when the sketch $Ax$ is corrupted by additive noise. This robustness is achieved by approximating the asymptotic posterior distribution of values in the sketch given its corrupted measurements. We provide analytic expressions that approximate these posteriors under the assumptions that the nonzero entries in the signal and the noise are drawn from continuous distributions. Numerical experiments presented show that Robust-$\ell_0$ is superior to existing greedy and combinatorial compressed sensing algorithms in the presence of small to moderate signal-to-noise ratios in the setting of Gaussian signals and Gaussian additive noise.

LGNov 3, 2025

Geometric Data Valuation via Leverage Scores

Rodrigo Mendoza-Smith

Shapley data valuation provides a principled, axiomatic framework for assigning importance to individual datapoints, and has gained traction in dataset curation, pruning, and pricing. However, it is a combinatorial measure that requires evaluating marginal utility across all subsets of the data, making it computationally infeasible at scale. We propose a geometric alternative based on statistical leverage scores, which quantify each datapoint's structural influence in the representation space by measuring how much it extends the span of the dataset and contributes to the effective dimensionality of the training problem. We show that our scores satisfy the dummy, efficiency, and symmetry axioms of Shapley valuation and that extending them to \emph{ridge leverage scores} yields strictly positive marginal gains that connect naturally to classical A- and D-optimal design criteria. We further show that training on a leverage-sampled subset produces a model whose parameters and predictive risk are within $O(\varepsilon)$ of the full-data optimum, thereby providing a rigorous link between data valuation and downstream decision quality. Finally, we conduct an active learning experiment in which we empirically demonstrate that ridge-leverage sampling outperforms standard baselines without requiring access gradients or backward passes.

LGJul 18, 2019

Federated Principal Component Analysis

Andreas Grammenos, Rodrigo Mendoza-Smith, Jon Crowcroft et al.

We present a federated, asynchronous, and $(\varepsilon, δ)$-differentially private algorithm for PCA in the memory-limited setting. Our algorithm incrementally computes local model updates using a streaming procedure and adaptively estimates its $r$ leading principal components when only $\mathcal{O}(dr)$ memory is available with $d$ being the dimensionality of the data. We guarantee differential privacy via an input-perturbation scheme in which the covariance matrix of a dataset $\mathbf{X} \in \mathbb{R}^{d \times n}$ is perturbed with a non-symmetric random Gaussian matrix with variance in $\mathcal{O}\left(\left(\frac{d}{n}\right)^2 \log d \right)$, thus improving upon the state-of-the-art. Furthermore, contrary to previous federated or distributed algorithms for PCA, our algorithm is also invariant to permutations in the incoming data, which provides robustness against straggler or failed nodes. Numerical simulations show that, while using limited-memory, our algorithm exhibits performance that closely matches or outperforms traditional non-federated algorithms, and in the absence of communication latency, it exhibits attractive horizontal scalability.

NAAug 6, 2015

Expander $\ell_0$-Decoding

Rodrigo Mendoza-Smith, Jared Tanner

We introduce two new algorithms, Serial-$\ell_0$ and Parallel-$\ell_0$ for solving a large underdetermined linear system of equations $y = Ax \in \mathbb{R}^m$ when it is known that $x \in \mathbb{R}^n$ has at most $k < m$ nonzero entries and that $A$ is the adjacency matrix of an unbalanced left $d$-regular expander graph. The matrices in this class are sparse and allow a highly efficient implementation. A number of algorithms have been designed to work exclusively under this setting, composing the branch of combinatorial compressed-sensing (CCS). Serial-$\ell_0$ and Parallel-$\ell_0$ iteratively minimise $\|y - A\hat x\|_0$ by successfully combining two desirable features of previous CCS algorithms: the information-preserving strategy of ER, and the parallel updating mechanism of SMP. We are able to link these elements and guarantee convergence in $\mathcal{O}(dn \log k)$ operations by assuming that the signal is dissociated, meaning that all of the $2^k$ subset sums of the support of $x$ are pairwise different. However, we observe empirically that the signal need not be exactly dissociated in practice. Moreover, we observe Serial-$\ell_0$ and Parallel-$\ell_0$ to be able to solve large scale problems with a larger fraction of nonzeros than other algorithms when the number of measurements is substantially less than the signal length; in particular, they are able to reliably solve for a $k$-sparse vector $x\in\mathbb{R}^n$ from $m$ expander measurements with $n/m=10^3$ and $k/m$ up to four times greater than what is achievable by $\ell_1$-regularization from dense Gaussian measurements. Additionally, Serial-$\ell_0$ and Parallel-$\ell_0$ are observed to be able to solve large problems sizes in substantially less time than other algorithms for compressed sensing. In particular, Parallel-$\ell_0$ is structured to take advantage of massively parallel architectures.