CO CG NEJul 18, 2020

Combinatorial and computational investigations of Neighbor-Joining bias

arXiv:2007.09345v317 citations

AI Analysis

This work addresses a foundational issue in computational phylogenetics by clarifying the algorithm's behavior, which is incremental as it builds on existing combinatorial descriptions.

The paper tackles the problem of understanding the combinatorial structure of the Neighbor-Joining algorithm's output space by defining agglomeration orders, establishing a bijection with weighted Motzkin paths, and deriving a formula for the number of polyhedral regions based on the number of taxa. It concludes with a computational analysis to reveal biases in algorithm implementations.

The Neighbor-Joining algorithm is a popular distance-based phylogenetic method that computes a tree metric from a dissimilarity map arising from biological data. Realizing dissimilarity maps as points in Euclidean space, the algorithm partitions the input space into polyhedral regions indexed by the combinatorial type of the trees returned. A full combinatorial description of these regions has not been found yet; different sequences of Neighbor-Joining agglomeration events can produce the same combinatorial tree, therefore associating multiple geometric regions to the same algorithmic output. We resolve this confusion by defining agglomeration orders on trees, leading to a bijection between distinct regions of the output space and weighted Motzkin paths. As a result, we give a formula for the number of polyhedral regions depending only on the number of taxa. We conclude with a computational comparison between these polyhedral regions, to unveil biases introduced in any implementation of the algorithm.

View on arXiv PDF

Similar