PRLGSTPEApr 21, 2015

Distance-based species tree estimation: information-theoretic trade-off between number of loci and sequence length under the coalescent

arXiv:1504.05289v210 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient species tree estimation for evolutionary biologists by providing theoretical bounds on data requirements, though it is incremental as it builds on existing coalescent and signal detection frameworks.

The paper tackles the problem of reconstructing a phylogeny from multiple genes under the multispecies coalescent by establishing a connection to sparse signal detection, deriving an information-theoretic trade-off that shows the number of genes needed scales as m = Θ(1/[f^2 √k]) to detect a branch of length f.

We consider the reconstruction of a phylogeny from multiple genes under the multispecies coalescent. We establish a connection with the sparse signal detection problem, where one seeks to distinguish between a distribution and a mixture of the distribution and a sparse signal. Using this connection, we derive an information-theoretic trade-off between the number of genes, $m$, needed for an accurate reconstruction and the sequence length, $k$, of the genes. Specifically, we show that to detect a branch of length $f$, one needs $m = Θ(1/[f^{2} \sqrt{k}])$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes