Norbert Zeh

COMar 7

A Class of Unrooted Phylogenetic Networks Inspired by the Properties of Rooted Tree-Child Networks

Leo van Iersel, Mark Jones, Simone Linz et al.

A directed phylogenetic network is tree-child if every non-leaf vertex has a child that is not a reticulation. As a class of directed phylogenetic networks, tree-child networks are very useful from a computational perspective. For example, several computationally difficult problems in phylogenetics become tractable when restricted to tree-child networks. At the same time, the class itself is rich enough to contain quite complex networks. Furthermore, checking whether a directed network is tree-child can be done in polynomial time. In this paper, we seek a class of undirected phylogenetic networks that is rich and computationally useful in a similar way to the class tree-child directed networks. A natural class to consider for this role is the class of tree-child-orientable networks which contains all those undirected phylogenetic networks whose edges can be oriented to create a tree-child network. However, we show here that recognizing such networks is NP-hard, even for binary networks, and as such this class is inappropriate for this role. Towards finding a class of undirected networks that fills a similar role to directed tree-child networks, we propose new classes called $q$-cuttable networks, for any integer $q\geq 1$. We show that these classes have many of the desirable properties, similar to tree-child networks in the rooted case, including being recognizable in polynomial time, for all $q\geq 1$. Towards showing the computational usefulness of the class, we show that the NP-hard problem Tree Containment is polynomial-time solvable when restricted to $q$-cuttable networks with $q\geq 3$.

IRJan 31, 2020

Enhancement of Short Text Clustering by Iterative Classification

Md Rashadul Hasan Rakib, Norbert Zeh, Magdalena Jankowska et al.

Short text clustering is a challenging task due to the lack of signal contained in such short texts. In this work, we propose iterative classification as a method to b o ost the clustering quality (e.g., accuracy) of short texts. Given a clustering of short texts obtained using an arbitrary clustering algorithm, iterative classification applies outlier removal to obtain outlier-free clusters. Then it trains a classification algorithm using the non-outliers based on their cluster distributions. Using the trained classification model, iterative classification reclassifies the outliers to obtain a new set of clusters. By repeating this several times, we obtain a much improved clustering of texts. Our experimental results show that the proposed clustering enhancement method not only improves the clustering quality of different clustering methods (e.g., k-means, k-means--, and hierarchical clustering) but also outperforms the state-of-the-art short text clustering methods on several short text datasets by a statistically significant margin.

Norbert Zeh

2 Papers