GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of Tree Topologies
This addresses the problem of scalable and accurate phylogenetic inference for biologists, representing a novel method for a known bottleneck.
The paper tackled the challenge of phylogenetic inference without restricting the vast number of possible tree topologies by introducing a fully differentiable formulation using geometric representations, and GeoPhy significantly outperformed other approximate Bayesian methods on real benchmark datasets.
Phylogenetic inference, grounded in molecular evolution models, is essential for understanding the evolutionary relationships in biological data. Accounting for the uncertainty of phylogenetic tree variables, which include tree topologies and evolutionary distances on branches, is crucial for accurately inferring species relationships from molecular data and tasks requiring variable marginalization. Variational Bayesian methods are key to developing scalable, practical models; however, it remains challenging to conduct phylogenetic inference without restricting the combinatorially vast number of possible tree topologies. In this work, we introduce a novel, fully differentiable formulation of phylogenetic inference that leverages a unique representation of topological distributions in continuous geometric spaces. Through practical considerations on design spaces and control variates for gradient estimations, our approach, GeoPhy, enables variational inference without limiting the topological candidates. In experiments using real benchmark datasets, GeoPhy significantly outperformed other approximate Bayesian methods that considered whole topologies.