Alberto Del Pia

LG
h-index15
6papers
33citations
Novelty54%
AI Score40

6 Papers

LGJun 9, 2022
Clustering with Queries under Semi-Random Noise

Alberto Del Pia, Mingchen Ma, Christos Tzamos

The seminal paper by Mazumdar and Saha \cite{MS17a} introduced an extensive line of work on clustering with noisy queries. Yet, despite significant progress on the problem, the proposed methods depend crucially on knowing the exact probabilities of errors of the underlying fully-random oracle. In this work, we develop robust learning methods that tolerate general semi-random noise obtaining qualitatively the same guarantees as the best possible methods in the fully-random model. More specifically, given a set of $n$ points with an unknown underlying partition, we are allowed to query pairs of points $u,v$ to check if they are in the same cluster, but with probability $p$, the answer may be adversarially chosen. We show that information theoretically $O\left(\frac{nk \log n} {(1-2p)^2}\right)$ queries suffice to learn any cluster of sufficiently large size. Our main result is a computationally efficient algorithm that can identify large clusters with $O\left(\frac{nk \log n} {(1-2p)^2}\right) + \text{poly}\left(\log n, k, \frac{1}{1-2p} \right)$ queries, matching the guarantees of the best known algorithms in the fully-random model. As a corollary of our approach, we develop the first parameter-free algorithm for the fully-random model, answering an open question by \cite{MS17a}.

OCJul 2, 2025
An SDP Relaxation for the Sparse Integer Least Squares Problem

Alberto Del Pia, Dekun Zhou

In this paper, we study the \emph{sparse integer least squares problem} (SILS), an NP-hard variant of least squares with sparse $\{0, \pm 1\}$-vectors. We propose an $\ell_1$-based SDP relaxation, and a randomized algorithm for SILS, which computes feasible solutions with high probability with an asymptotic approximation ratio $1/T^2$ as long as the sparsity constant $σ\ll T$. Our algorithm handles large-scale problems, delivering high-quality approximate solutions for dimensions up to $d = 10,000$. The proposed randomized algorithm applies broadly to binary quadratic programs with a cardinality constraint, even for non-convex objectives. For fixed sparsity, we provide sufficient conditions for our SDP relaxation to solve SILS, meaning that any optimal solution to the SDP relaxation yields an optimal solution to SILS. The class of data input which guarantees that SDP solves SILS is broad enough to cover many cases in real-world applications, such as privacy preserving identification and multiuser detection. We validate these conditions in two application-specific cases: the \emph{feature extraction problem}, where our relaxation solves the problem for sub-Gaussian data with weak covariance conditions, and the \emph{integer sparse recovery problem}, where our relaxation solves the problem in both high and low coherence settings under certain conditions.

LGOct 18, 2024
Efficient Sparse PCA via Block-Diagonalization

Alberto Del Pia, Dekun Zhou, Yinglun Zhu

Sparse Principal Component Analysis (Sparse PCA) is a pivotal tool in data analysis and dimensionality reduction. However, Sparse PCA is a challenging problem in both theory and practice: it is known to be NP-hard and current exact methods generally require exponential runtime. In this paper, we propose a novel framework to efficiently approximate Sparse PCA by (i) approximating the general input covariance matrix with a re-sorted block-diagonal matrix, (ii) solving the Sparse PCA sub-problem in each block, and (iii) reconstructing the solution to the original problem. Our framework is simple and powerful: it can leverage any off-the-shelf Sparse PCA algorithm and achieve significant computational speedups, with a minor additive error that is linear in the approximation error of the block-diagonal matrix. Suppose $g(k, d)$ is the runtime of an algorithm (approximately) solving Sparse PCA in dimension $d$ and with sparsity constant $k$. Our framework, when integrated with this algorithm, reduces the runtime to $\mathcal{O}\left(\frac{d}{d^\star} \cdot g(k, d^\star) + d^2\right)$, where $d^\star \leq d$ is the largest block size of the block-diagonal matrix. For instance, integrating our framework with the Branch-and-Bound algorithm reduces the complexity from $g(k, d) = \mathcal{O}(k^3\cdot d^k)$ to $\mathcal{O}(k^3\cdot d \cdot (d^\star)^{k-1})$, demonstrating exponential speedups if $d^\star$ is small. We perform large-scale evaluations on many real-world datasets: for exact Sparse PCA algorithm, our method achieves an average speedup factor of 100.50, while maintaining an average approximation error of 0.61%; for approximate Sparse PCA algorithm, our method achieves an average speedup factor of 6.00 and an average approximation error of -0.91%, meaning that our method oftentimes finds better solutions.

MLJul 12, 2025
A Randomized Algorithm for Sparse PCA based on the Basic SDP Relaxation

Alberto Del Pia, Dekun Zhou

Sparse Principal Component Analysis (SPCA) is a fundamental technique for dimensionality reduction, and is NP-hard. In this paper, we introduce a randomized approximation algorithm for SPCA, which is based on the basic SDP relaxation. Our algorithm has an approximation ratio of at most the sparsity constant with high probability, if called enough times. Under a technical assumption, which is consistently satisfied in our numerical tests, the average approximation ratio is also bounded by $\mathcal{O}(\log{d})$, where $d$ is the number of features. We show that this technical assumption is satisfied if the SDP solution is low-rank, or has exponentially decaying eigenvalues. We then present a broad class of instances for which this technical assumption holds. We also demonstrate that in a covariance model, which generalizes the spiked Wishart model, our proposed algorithm achieves a near-optimal approximation ratio. We demonstrate the efficacy of our algorithm through numerical results on real-world datasets.

LGJan 7, 2022
Sparse PCA on fixed-rank matrices

Alberto Del Pia

Sparse PCA is the optimization problem obtained from PCA by adding a sparsity constraint on the principal components. Sparse PCA is NP-hard and hard to approximate even in the single-component case. In this paper we settle the computational complexity of sparse PCA with respect to the rank of the covariance matrix. We show that, if the rank of the covariance matrix is a fixed value, then there is an algorithm that solves sparse PCA to global optimality, whose running time is polynomial in the number of features. We also prove a similar result for the version of sparse PCA which requires the principal components to have disjoint supports.

OCOct 5, 2018
Subset selection in sparse matrices

Alberto Del Pia, Santanu S. Dey, Robert Weismantel

In subset selection we search for the best linear predictor that involves a small subset of variables. From a computational complexity viewpoint, subset selection is NP-hard and few classes are known to be solvable in polynomial time. Using mainly tools from discrete geometry, we show that some sparsity conditions on the original data matrix allow us to solve the problem in polynomial time.