Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein
This work addresses the need for summarizing real data by blending dimensionality reduction and clustering, which is incremental as it adapts existing methods.
The paper tackles the problem of simultaneously reducing sample and feature sizes by adapting dimensionality reduction objectives using a semi-relaxed Gromov-Wasserstein optimal transport approach, achieving competitive hard clustering and enabling visualization of image datasets.
We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images.