PEMar 31, 2023
Constructing Phylogenetic Networks via Cherry Picking and Machine LearningGiulia Bernardini, Leo van Iersel, Esther Julien et al.
Combining a set of phylogenetic trees into a single phylogenetic network that explains all of them is a fundamental challenge in evolutionary studies. Existing methods are computationally expensive and can either handle only small numbers of phylogenetic trees or are limited to severely restricted classes of networks. In this paper, we apply the recently-introduced theoretical framework of cherry picking to design a class of efficient heuristics that are guaranteed to produce a network containing each of the input trees, for datasets consisting of binary trees. Some of the heuristics in this framework are based on the design and training of a machine learning model that captures essential information on the structure of the input trees and guides the algorithms towards better solutions. We also propose simple and fast randomised heuristics that prove to be very effective when run multiple times. Unlike the existing exact methods, our heuristics are applicable to datasets of practical size, and the experimental study we conducted on both simulated and real data shows that these solutions are qualitatively good, always within some small constant factor from the optimum. Moreover, our machine-learned heuristics are one of the first applications of machine learning to phylogenetics and show its promise.
57.1DSMay 29
Tree Containment Parameterized by ScanwidthLeo van Iersel, Mark Jones, Mathias Weller
TREE CONTAINMENT is a central decision problem in mathematical phylogenetics, asking whether a given rooted phylogenetic tree is embeddable in ("displayed by") a given rooted phylogenetic network. While the problem is NP-complete for general networks, many algorithmic advances have relied on structural parameters that capture how "tree-like" a network is. In this paper we investigate TREE CONTAINMENT under the structural parameter scanwidth, a directed width measure generalizing popular parameters measuring tree-likeness of phylogenetic networks. We first present a parameterized algorithm that solves the problem in $O(4^{k + k\log{k}} n + nm^2)$ time, where $n$ and $m$ are the numbers of nodes and arcs in the network and $k$ is the width of a given tree-extension. Complementing this upper bound, we prove a matching lower bound under the Exponential-Time Hypothesis (ETH), showing that there is no algorithm for TREE CONTAINMENT that runs in $2^{o(c\log{c})} n^{O(1)}$ time, even on binary inputs, where $c$ is the directed cutwidth of the input network, which upper-bounds the scanwidth $k$.
DSFeb 17
Exact and Heuristic Computation of the Scanwidth of Directed Acyclic GraphsNiels Holtgrefe, Leo van Iersel, Mark Jones
To measure the tree-likeness of a directed acyclic graph (DAG), a new width parameter that considers the directions of the arcs was recently introduced: scanwidth. We present the first algorithm that efficiently computes the exact scanwidth of general DAGs. For DAGs with one root and scanwidth $k$ it runs in $O(k \cdot n^k \cdot m)$ time. The algorithm also functions as an FPT algorithm with complexity $O(2^{4 \ell - 1} \cdot \ell \cdot n + n^2)$ for phylogenetic networks of level-$\ell$, a type of DAG used to depict evolutionary relationships among species. Our algorithm performs well in practice, being able to compute the scanwidth of synthetic networks up to 30 reticulations and 100 leaves within 500 seconds. Furthermore, we propose a heuristic that obtains an average practical approximation ratio of 1.5 on these networks. While we prove that the scanwidth is bounded from below by the treewidth of the underlying undirected graph, experiments suggest that for networks the parameters are close in practice.
41.4DSApr 30
Average-Tree Phylogenetic Diversity Parameterized by Scanwidth and InvisibilityLeo van Iersel, Mark Jones, Jannik Schestag et al.
We investigate parameterized algorithms for computing the average-tree phylogenetic diversity (APD) in rooted phylogenetic networks, studying the problem under different structural parameters that capture the deviation of a network from a tree. Our primary parameter is the scanwidth, a measure of the tree-likeness of a given directed acyclic graph. We show that a subset of taxa with maximum APD can be found in polynomial time in phylogenetic networks of scanwidth at most 2, but becomes NP-hard in networks of scanwidth 3. Further, we design an algorithm that computes the APD of a given set of taxa in O(2^sw n) time, where sw denotes the scanwidth and n the number of taxa in the input network. Finally, we give a linear-time algorithm for computing the APD of a given set of taxa if the network induced by these taxa is reticulation-visible. We generalize this algorithm to still run in polynomial time if each biconnected component of the induced network has only constantly many invisible reticulations.
COMar 7
A Class of Unrooted Phylogenetic Networks Inspired by the Properties of Rooted Tree-Child NetworksLeo van Iersel, Mark Jones, Simone Linz et al.
A directed phylogenetic network is tree-child if every non-leaf vertex has a child that is not a reticulation. As a class of directed phylogenetic networks, tree-child networks are very useful from a computational perspective. For example, several computationally difficult problems in phylogenetics become tractable when restricted to tree-child networks. At the same time, the class itself is rich enough to contain quite complex networks. Furthermore, checking whether a directed network is tree-child can be done in polynomial time. In this paper, we seek a class of undirected phylogenetic networks that is rich and computationally useful in a similar way to the class tree-child directed networks. A natural class to consider for this role is the class of tree-child-orientable networks which contains all those undirected phylogenetic networks whose edges can be oriented to create a tree-child network. However, we show here that recognizing such networks is NP-hard, even for binary networks, and as such this class is inappropriate for this role. Towards finding a class of undirected networks that fills a similar role to directed tree-child networks, we propose new classes called $q$-cuttable networks, for any integer $q\geq 1$. We show that these classes have many of the desirable properties, similar to tree-child networks in the rooted case, including being recognizable in polynomial time, for all $q\geq 1$. Towards showing the computational usefulness of the class, we show that the NP-hard problem Tree Containment is polynomial-time solvable when restricted to $q$-cuttable networks with $q\geq 3$.
PEApr 15, 2024
Solving the Tree Containment Problem Using Graph Neural NetworksArkadiy Dushatskiy, Esther Julien, Leen Stougie et al.
Tree Containment is a fundamental problem in phylogenetics useful for verifying a proposed phylogenetic network, representing the evolutionary history of certain species. Tree Containment asks whether the given phylogenetic tree (for instance, constructed from a DNA fragment showing tree-like evolution) is contained in the given phylogenetic network. In the general case, this is an NP-complete problem. We propose to solve it approximately using Graph Neural Networks. In particular, we propose to combine the given network and the tree and apply a Graph Neural Network to this network-tree graph. This way, we achieve the capability of solving the tree containment instances representing a larger number of species than the instances contained in the training dataset (i.e., our algorithm has the inductive learning ability). Our algorithm demonstrates an accuracy of over $95\%$ in solving the tree containment problem on instances with up to 100 leaves.