DMCVDSCOOCApr 9, 2024

The Central Spanning Tree Problem

arXiv:2404.06447v11 citationsh-index: 53
Originality Incremental advance
AI Analysis

This addresses the need for more stable tree-based data summarization in fields like biology and computer vision, though it is incremental as it builds on prior spanning tree definitions.

The paper tackles the problem of spanning trees being sensitive to noise in data by introducing the (branched) central spanning tree, which empirically shows improved robustness to noise compared to existing methods like minimum spanning and Steiner trees.

Spanning trees are an important primitive in many data analysis tasks, when a data set needs to be summarized in terms of its "skeleton", or when a tree-shaped graph over all observations is required for downstream processing. Popular definitions of spanning trees include the minimum spanning tree and the optimum distance spanning tree, a.k.a. the minimum routing cost tree. When searching for the shortest spanning tree but admitting additional branching points, even shorter spanning trees can be realized: Steiner trees. Unfortunately, both minimum spanning and Steiner trees are not robust with respect to noise in the observations; that is, small perturbations of the original data set often lead to drastic changes in the associated spanning trees. In response, we make two contributions when the data lies in a Euclidean space: on the theoretical side, we introduce a new optimization problem, the "(branched) central spanning tree", which subsumes all previously mentioned definitions as special cases. On the practical side, we show empirically that the (branched) central spanning tree is more robust to noise in the data, and as such is better suited to summarize a data set in terms of its skeleton. We also propose a heuristic to address the NP-hard optimization problem, and illustrate its use on single cell RNA expression data from biology and 3D point clouds of plants.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes