PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders
This work addresses the need for high-resolution and efficient analysis of evolutionary relationships in computational biology, representing an incremental improvement over classical distance-based methods.
The paper tackles the problem of learning representations of phylogenetic tree structures by introducing PhyloVAE, an unsupervised framework using variational autoencoders, which achieves robust representation learning and fast generation of tree topologies.
Learning informative representations of phylogenetic tree structures is essential for analyzing evolutionary relationships. Classical distance-based methods have been widely used to project phylogenetic trees into Euclidean space, but they are often sensitive to the choice of distance metric and may lack sufficient resolution. In this paper, we introduce phylogenetic variational autoencoders (PhyloVAEs), an unsupervised learning framework designed for representation learning and generative modeling of tree topologies. Leveraging an efficient encoding mechanism inspired by autoregressive tree topology generation, we develop a deep latent-variable generative model that facilitates fast, parallelized topology generation. PhyloVAE combines this generative model with a collaborative inference model based on learnable topological features, allowing for high-resolution representations of phylogenetic tree samples. Extensive experiments demonstrate PhyloVAE's robust representation learning capabilities and fast generation of phylogenetic tree topologies.