Distributed Estimation of Generalized Matrix Rank: Efficient Algorithms and Lower Bounds
This work addresses communication efficiency for distributed matrix computations, which is crucial for large-scale data processing, though it is incremental as it builds on existing distributed algorithms.
The paper tackles the problem of estimating the generalized matrix rank in a distributed setting, where the matrix is split across machines, and shows that any deterministic algorithm requires Ω(n²) bits of communication, but a randomized algorithm achieves only Õ(n) bits, with matching lower bounds, and demonstrates practical effectiveness through experiments.
We study the following generalized matrix rank estimation problem: given an $n \times n$ matrix and a constant $c \geq 0$, estimate the number of eigenvalues that are greater than $c$. In the distributed setting, the matrix of interest is the sum of $m$ matrices held by separate machines. We show that any deterministic algorithm solving this problem must communicate $Ω(n^2)$ bits, which is order-equivalent to transmitting the whole matrix. In contrast, we propose a randomized algorithm that communicates only $\widetilde O(n)$ bits. The upper bound is matched by an $Ω(n)$ lower bound on the randomized communication complexity. We demonstrate the practical effectiveness of the proposed algorithm with some numerical experiments.