Anna Beer

LG
h-index13
3papers
1citation
Novelty53%
AI Score26

3 Papers

LGJul 31, 2024
Temporal Subspace Clustering for Molecular Dynamics Data

Anna Beer, Martin Heinrigs, Claudia Plant et al.

We introduce MOSCITO (MOlecular Dynamics Subspace Clustering with Temporal Observance), a subspace clustering for molecular dynamics data. MOSCITO groups those timesteps of a molecular dynamics trajectory together into clusters in which the molecule has similar conformations. In contrast to state-of-the-art methods, MOSCITO takes advantage of sequential relationships found in time series data. Unlike existing work, MOSCITO does not need a two-step procedure with tedious post-processing, but directly models essential properties of the data. Interpreting clusters as Markov states allows us to evaluate the clustering performance based on the resulting Markov state models. In experiments on 60 trajectories and 4 different proteins, we show that the performance of MOSCITO achieves state-of-the-art performance in a novel single-step method. Moreover, by modeling temporal aspects, MOSCITO obtains better segmentation of trajectories, especially for small numbers of clusters.

LGFeb 28, 2025
DISCO: Internal Evaluation of Density-Based Clustering

Anna Beer, Lena Krieger, Pascal Weber et al.

In density-based clustering, clusters are areas of high object density separated by lower object density areas. This notion supports arbitrarily shaped clusters and automatic detection of noise points that do not belong to any cluster. However, it is challenging to adequately evaluate the quality of density-based clustering results. Even though some existing cluster validity indices (CVIs) target arbitrarily shaped clusters, none of them captures the quality of the labeled noise. In this paper, we propose DISCO, a Density-based Internal Score for Clustering Outcomes, which is the first CVI that also evaluates the quality of noise labels. DISCO reliably evaluates density-based clusters of arbitrary shape by assessing compactness and separation. It also introduces a direct assessment of noise labels for any given clustering. Our experiments show that DISCO evaluates density-based clusterings more consistently than its competitors. It is additionally the first method to evaluate the complete labeling of density-based clustering methods, including noise labels.

LGFeb 19, 2025
Ultrametric Cluster Hierarchies: I Want 'em All!

Andrew Draganov, Pascal Weber, Rasmus Skibdahl Melanchton Jørgensen et al.

Hierarchical clustering is a powerful tool for exploratory data analysis, organizing data into a tree of clusterings from which a partition can be chosen. This paper generalizes these ideas by proving that, for any reasonable hierarchy, one can optimally solve any center-based clustering objective over it (such as $k$-means). Moreover, these solutions can be found exceedingly quickly and are themselves necessarily hierarchical. Thus, given a cluster tree, we show that one can quickly access a plethora of new, equally meaningful hierarchies. Just as in standard hierarchical clustering, one can then choose any desired partition from these new hierarchies. We conclude by verifying the utility of our proposed techniques across datasets, hierarchies, and partitioning schemes.