MLLGFeb 28, 2020

Spectral neighbor joining for reconstruction of latent tree models

arXiv:2002.12547v34 citations
AI Analysis

This work addresses the challenge of reconstructing latent tree models, such as evolutionary lineages in phylogenetics, with an incremental improvement in sample efficiency for large or complex trees.

The authors tackled the problem of inferring latent tree topologies from observed data, common in fields like phylogenetics, by developing Spectral Neighbor Joining (SNJ), a novel method that uses spectral measures of cohesion. They proved SNJ is consistent, derived conditions for correct recovery, and showed via simulations that it requires fewer samples to accurately recover trees with many leaves or long edges compared to other methods.

A common assumption in multiple scientific applications is that the distribution of observed data can be modeled by a latent tree graphical model. An important example is phylogenetics, where the tree models the evolutionary lineages of a set of observed organisms. Given a set of independent realizations of the random variables at the leaves of the tree, a key challenge is to infer the underlying tree topology. In this work we develop Spectral Neighbor Joining (SNJ), a novel method to recover the structure of latent tree graphical models. Given a matrix that contains a measure of similarity between all pairs of observed variables, SNJ computes a spectral measure of cohesion between groups of observed variables. We prove that SNJ is consistent, and derive a sufficient condition for correct tree recovery from an estimated similarity matrix. Combining this condition with a concentration of measure result on the similarity matrix, we bound the number of samples required to recover the tree with high probability. We illustrate via extensive simulations that in comparison to several other reconstruction methods, SNJ requires fewer samples to accurately recover trees with a large number of leaves or long edges.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes