UMATO: Bridging Local and Global Structures for Reliable Visual Analytics with Dimensionality Reduction
This work addresses the need for more reliable visual analytics in high-dimensional data analysis, offering an incremental improvement over existing methods like UMAP.
The paper tackles the problem of dimensionality reduction techniques failing to preserve both local and global structures in high-dimensional data, which can mislead analysts. It introduces UMATO, a method that divides UMAP's optimization into two phases to capture both structures, showing it outperforms UMAP and other techniques in global preservation with slight local loss, as validated by quantitative experiments.
Due to the intrinsic complexity of high-dimensional (HD) data, dimensionality reduction (DR) techniques cannot preserve all the structural characteristics of the original data. Therefore, DR techniques focus on preserving either local neighborhood structures (local techniques) or global structures such as pairwise distances between points (global techniques). However, both approaches can mislead analysts to erroneous conclusions about the overall arrangement of manifolds in HD data. For example, local techniques may exaggerate the compactness of individual manifolds, while global techniques may fail to separate clusters that are well-separated in the original space. In this research, we provide a deeper insight into Uniform Manifold Approximation with Two-phase Optimization (UMATO), a DR technique that addresses this problem by effectively capturing local and global structures. UMATO achieves this by dividing the optimization process of UMAP into two phases. In the first phase, it constructs a skeletal layout using representative points, and in the second phase, it projects the remaining points while preserving the regional characteristics. Quantitative experiments validate that UMATO outperforms widely used DR techniques, including UMAP, in terms of global structure preservation, with a slight loss in local structure. We also confirm that UMATO outperforms baseline techniques in terms of scalability and stability against initialization and subsampling, making it more effective for reliable HD data analysis. Finally, we present a case study and a qualitative demonstration that highlight UMATO's effectiveness in generating faithful projections, enhancing the overall reliability of visual analytics using DR.