Exact inference under the perfect phylogeny model
This work addresses a computational bottleneck in evolutionary biology for researchers using the PPM to analyze genetic data, offering an exact solution that improves accuracy over heuristic approaches.
The authors tackled the problem of exact inference under the Perfect Phylogeny Model (PPM) for learning trees from noisy variant allele frequency data, which is computationally hard and previously relied on approximate methods. They developed EXACT, a tool that performs exact inference by exploring all possible phylogenetic trees, outperforming existing tools and providing exact statistics on tree distributions.
Motivation: Many inference tools use the Perfect Phylogeny Model (PPM) to learn trees from noisy variant allele frequency (VAF) data. Learning in this setting is hard, and existing tools use approximate or heuristic algorithms. An algorithmic improvement is important to help disentangle the limitations of the PPM's assumptions from the limitations in our capacity to learn under it. Results: We make such improvement in the scenario, where the mutations that are relevant for evolution can be clustered into a small number of groups, and the trees to be reconstructed have a small number of nodes. We use a careful combination of algorithms, software, and hardware, to develop EXACT: a tool that can explore the space of all possible phylogenetic trees, and performs exact inference under the PPM with noisy data. EXACT allows users to obtain not just the most-likely tree for some input data, but exact statistics about the distribution of trees that might explain the data. We show that EXACT outperforms several existing tools for this same task. Availability: https://github.com/surjray-repos/EXACT