Data Requirement for Phylogenetic Inference from Multiple Loci: A New Distance Method
This addresses the challenge of accurate phylogenetic inference for biologists, offering incremental improvements in handling gene-level errors.
The paper tackles the problem of estimating species trees from multiple genes, accounting for gene tree estimation errors, and provides the first full data-requirement analysis and a novel algorithm that provably improves over previous methods in a specific regime.
We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It is known that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate gene trees from molecular sequences of finite length. We provide the first full data-requirement analysis of a species tree reconstruction method that takes into account estimation errors at the gene level. Under that criterion, we also devise a novel reconstruction algorithm that provably improves over all previous methods in a regime of interest.