Approximate Tree Completion and Learning-Augmented Algorithms for Metric Minimum Spanning Trees
This work addresses the computational bottleneck of MSTs in ML applications, offering a practical solution for faster hierarchical clustering and similar tasks, though it is incremental in improving approximation algorithms.
The paper tackles the problem of efficiently approximating minimum spanning trees (MSTs) in metric spaces, which is fundamental for hierarchical clustering and other ML tasks, by introducing a framework that combines practical heuristics with a subquadratic 2.62-approximation algorithm and learning-augmented techniques, achieving nearly optimal trees with orders of magnitude speedup over exact methods.
Finding a minimum spanning tree (MST) for $n$ points in an arbitrary metric space is a fundamental primitive for hierarchical clustering and many other ML tasks, but this takes $Ω(n^2)$ time to even approximate. We introduce a framework for metric MSTs that first (1) finds a forest of disconnected components using practical heuristics, and then (2) finds a small weight set of edges to connect disjoint components of the forest into a spanning tree. We prove that optimally solving the second step still takes $Ω(n^2)$ time, but we provide a subquadratic 2.62-approximation algorithm. In the spirit of learning-augmented algorithms, we then show that if the forest found in step (1) overlaps with an optimal MST, we can approximate the original MST problem in subquadratic time, where the approximation factor depends on a measure of overlap. In practice, we find nearly optimal spanning trees for a wide range of metrics, while being orders of magnitude faster than exact algorithms.