Sparse clustering via the Deterministic Information Bottleneck algorithm
This addresses sparse data clustering problems for domains like genomics, but it appears incremental as a competitive alternative to existing algorithms.
The paper tackled the challenge of clustering when cluster structure is confined to a subset of features, presenting an information-theoretic framework for joint feature weighting and clustering. It demonstrated effectiveness through simulations on synthetic data and an application on a real-world genomics dataset.
Cluster analysis relates to the task of assigning objects into groups which ideally present some desirable characteristics. When a cluster structure is confined to a subset of the feature space, traditional clustering techniques face unprecedented challenges. We present an information-theoretic framework that overcomes the problems associated with sparse data, allowing for joint feature weighting and clustering. Our proposal constitutes a competitive alternative to existing clustering algorithms for sparse data, as demonstrated through simulations on synthetic data. The effectiveness of our method is established by an application on a real-world genomics data set.