LGAIMLJan 25, 2024

Spectral Clustering for Discrete Distributions

arXiv:2401.13913v2
Originality Incremental advance
AI Analysis

This work addresses clustering of discrete distributions for applications like images and documents, offering a scalable and accurate alternative to existing methods, though it is incremental as it builds on spectral clustering and affinity measures.

The paper tackles the problem of clustering discrete distributions, which is traditionally done with Wasserstein barycenter methods that are inaccurate and computationally expensive, by proposing spectral clustering with distribution affinity measures and linear optimal transport, resulting in more accurate and efficient performance as shown in experiments.

The discrete distribution is often used to describe complex instances in machine learning, such as images, sequences, and documents. Traditionally, clustering of discrete distributions (D2C) has been approached using Wasserstein barycenter methods. These methods operate under the assumption that clusters can be well-represented by barycenters, which is seldom true in many real-world applications. Additionally, these methods are not scalable for large datasets due to the high computational cost of calculating Wasserstein barycenters. In this work, we explore the feasibility of using spectral clustering combined with distribution affinity measures (e.g., maximum mean discrepancy and Wasserstein distance) to cluster discrete distributions. We demonstrate that these methods can be more accurate and efficient than barycenter methods. To further enhance scalability, we propose using linear optimal transport to construct affinity matrices efficiently for large datasets. We provide theoretical guarantees for the success of our methods in clustering distributions. Experiments on both synthetic and real data show that our methods outperform existing baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes