LGCEDSSep 12, 2014

A Fast Quartet Tree Heuristic for Hierarchical Clustering

arXiv:1409.4276v1
Originality Synthesis-oriented
AI Analysis

This work addresses hierarchical clustering for non-phylogeny data across domains, but it is incremental as it improves an existing heuristic with faster computation.

The paper tackles the Minimum Quartet Tree Cost problem for hierarchical clustering by presenting a Monte Carlo heuristic based on randomized hill climbing to approximate the optimal weight tree from quartet topology weights, achieving a speed improvement by a factor of 1,000 to 10,000 over the original version. It compares performance and running time with methods like UPGMA, BioNJ, and NJ on genomic data, showing competitive results.

The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the $3{n \choose 4}$ weighted quartet topologies on $n$ objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a dendrogram, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The problem and the solution heuristic has been extensively used for general hierarchical clustering of nontree-like (non-phylogeny) data in various domains and across domains with heterogeneous data. We also present a greatly improved heuristic, reducing the running time by a factor of order a thousand to ten thousand. All this is implemented and available, as part of the CompLearn package. We compare performance and running time of the original and improved versions with those of UPGMA, BioNJ, and NJ, as implemented in the SplitsTree package on genomic data for which the latter are optimized. Keywords: Data and knowledge visualization, Pattern matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering, Global optimization, Quartet tree, Randomized hill-climbing,

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes