Antonis Matakos

LG
h-index73
3papers
13citations
Novelty53%
AI Score29

3 Papers

LGJun 16, 2022
Generalized Leverage Scores: Geometric Interpretation and Applications

Bruno Ordozgoiti, Antonis Matakos, Aristides Gionis

In problems involving matrix computations, the concept of leverage has found a large number of applications. In particular, leverage scores, which relate the columns of a matrix to the subspaces spanned by its leading singular vectors, are helpful in revealing column subsets to approximately factorize a matrix with quality guarantees. As such, they provide a solid foundation for a variety of machine-learning methods. In this paper we extend the definition of leverage scores to relate the columns of a matrix to arbitrary subsets of singular vectors. We establish a precise connection between column and singular-vector subsets, by relating the concepts of leverage scores and principal angles between subspaces. We employ this result to design approximation algorithms with provable guarantees for two well-known problems: generalized column subset selection and sparse canonical correlation analysis. We run numerical experiments to provide further insight on the proposed methods. The novel bounds we derive improve our understanding of fundamental concepts in matrix approximations. In addition, our insights may serve as building blocks for further contributions.

LGJun 7, 2023
Fair Column Subset Selection

Antonis Matakos, Bruno Ordozgoiti, Suhas Thejaswi

The problem of column subset selection asks for a subset of columns from an input matrix such that the matrix can be reconstructed as accurately as possible within the span of the selected columns. A natural extension is to consider a setting where the matrix rows are partitioned into two groups, and the goal is to choose a subset of columns that minimizes the maximum reconstruction error of both groups, relative to their respective best rank-k approximation. Extending the known results of column subset selection to this fair setting is not straightforward: in certain scenarios it is unavoidable to choose columns separately for each group, resulting in double the expected column count. We propose a deterministic leverage-score sampling strategy for the fair setting and show that sampling a column subset of minimum size becomes NP-hard in the presence of two groups. Despite these negative results, we give an approximation algorithm that guarantees a solution within 1.5 times the optimal solution size. We also present practical heuristic algorithms based on rank-revealing QR factorization. Finally, we validate our methods through an extensive set of experiments using real-world data.

LGMar 27, 2025
Fair PCA, One Component at a Time

Antonis Matakos, Martino Ciaperoni, Heikki Mannila

The Min-Max Fair PCA problem seeks a low-rank representation of multi-group data such that the the approximation error is as balanced as possible across groups. Existing approaches to this problem return a rank-$d$ fair subspace, but lack the fundamental containment property of standard PCA: each rank-$d$ PCA subspace should contain all lower-rank PCA subspaces. To fill this gap, we define fair principal components as directions that minimize the maximum group-wise reconstruction error, subject to orthogonality with previously selected components, and we introduce an iterative method to compute them. This approach preserves the containment property of standard PCA, and reduces to standard \pca for data with a single group. We analyze the theoretical properties of our method and show empirically that it outperforms existing approaches to Min-Max Fair PCA.