Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein
This provides a novel approach for unsupervised learning that integrates dimensionality reduction and clustering, potentially benefiting researchers in machine learning and data analysis.
The paper tackled the problem of unifying dimensionality reduction and clustering by introducing a new framework based on optimal transport and the Gromov-Wasserstein problem, which allows addressing both tasks jointly in a single optimization problem, and demonstrated its relevance on image and genomic datasets.
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction (DR) methods to project data onto lower-dimensional spaces or organizing points into meaningful clusters (clustering). In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem. We empirically demonstrate its relevance to the identification of low-dimensional prototypes representing data at different scales, across multiple image and genomic datasets.