On the Approximation of Phylogenetic Distance Functions by Artificial Neural Networks
This work addresses the challenge of scalable phylogenetic inference for biologists, offering a computationally efficient alternative to model-based methods.
The paper tackled the problem of inferring phylogenetic relationships by developing minimal neural network architectures that approximate classic phylogenetic distance functions, achieving results comparable to state-of-the-art inference methods with scalability to large datasets.
Inferring the phylogenetic relationships among a sample of organisms is a fundamental problem in modern biology. While distance-based hierarchical clustering algorithms achieved early success on this task, these have been supplanted by Bayesian and maximum likelihood search procedures based on complex models of molecular evolution. In this work we describe minimal neural network architectures that can approximate classic phylogenetic distance functions and the properties required to learn distances under a variety of molecular evolutionary models. In contrast to model-based inference (and recently proposed model-free convolutional and transformer networks), these architectures have a small computational footprint and are scalable to large numbers of taxa and molecular characters. The learned distance functions generalize well and, given an appropriate training dataset, achieve results comparable to state-of-the art inference methods.