Cluster Exploration using Informative Manifold Projections
This addresses the need for more tailored dimensionality reduction methods in data visualization, though it is incremental as it builds on existing techniques.
The paper tackles the problem of generating informative embeddings for visual exploration of high-dimensional data by factoring out prior knowledge while revealing remaining structure, achieving this through a linear combination of contrastive PCA and kurtosis projection pursuit validated on various datasets.
Dimensionality reduction (DR) is one of the key tools for the visual exploration of high-dimensional data and uncovering its cluster structure in two- or three-dimensional spaces. The vast majority of DR methods in the literature do not take into account any prior knowledge a practitioner may have regarding the dataset under consideration. We propose a novel method to generate informative embeddings which not only factor out the structure associated with different kinds of prior knowledge but also aim to reveal any remaining underlying structure. To achieve this, we employ a linear combination of two objectives: firstly, contrastive PCA that discounts the structure associated with the prior information, and secondly, kurtosis projection pursuit which ensures meaningful data separation in the obtained embeddings. We formulate this task as a manifold optimization problem and validate it empirically across a variety of datasets considering three distinct types of prior knowledge. Lastly, we provide an automated framework to perform iterative visual exploration of high-dimensional data.