Sublinear Spectral Clustering Oracle with Little Memory
This work addresses the memory bottleneck of existing spectral clustering oracles for massive graphs, enabling clustering queries with extremely low memory.
The paper presents sublinear spectral clustering oracles for well-clusterable graphs that use much less than O(√n) memory (e.g., O(n^0.01)) while still answering membership queries in sublinear time, achieving a memory-time trade-off S·T = Õ(n) for graphs with logarithmic conductance gap, which is nearly optimal.
We study the problem of designing \emph{sublinear spectral clustering oracles} for well-clusterable graphs. Such an oracle is an algorithm that, given query access to the adjacency list of a graph $G$, first constructs a compact data structure $\mathcal{D}$ that captures the clustering structure of $G$. Once built, $\mathcal{D}$ enables sublinear time responses to \textsc{WhichCluster}$(G,x)$ queries for any vertex $x$. A major limitation of existing oracles is that constructing $\mathcal{D}$ requires $Ω(\sqrt{n})$ memory, which becomes a bottleneck for massive graphs and memory-limited settings. In this paper, we break this barrier and establish a memory-time trade-off for sublinear spectral clustering oracles. Specifically, for well-clusterable graphs, we present oracles that construct $\mathcal{D}$ using much smaller than $O(\sqrt{n})$ memory (e.g., $O(n^{0.01})$) while still answering membership queries in sublinear time. We also characterize the trade-off frontier between memory usage $S$ and query time $T$, showing, for example, that $S\cdot T=\widetilde{O}(n)$ for clusterable graphs with a logarithmic conductance gap, and we show that this trade-off is nearly optimal (up to logarithmic factors) for a natural class of approaches. Finally, to complement our theory, we validate the performance of our oracles through experiments on synthetic networks.