ScaleMAP: Preserving Local Density and Neighborhood Structure in Low-Dimensional Embeddings
This work addresses the problem of distorted neighborhood scale in low-dimensional embeddings for researchers working with sparse structures in scientific datasets.
Nonlinear dimensionality-reduction methods like UMAP and PaCMAP distort neighborhood scale, suppressing sparse structures. ScaleMAP re-injects scale information by dividing embedding displacements by original-space local radii, recovering sparse bridges in transcriptomic data and faithfully representing density structure across 17 orders of magnitude in flow cytometry.
Nonlinear dimensionality-reduction methods such as UMAP and PaCMAP adaptively normalize local distances during graph construction, erasing neighborhood scale from the data. This distorts more than relative cluster sizes: sparse structures like bridges between transitioning cell types and narrow spectral spikes in hyperspectral images can be suppressed or lost entirely. DensMAP adds a density penalty to correct this, but this penalty competes with UMAP's attraction-repulsion forces, scattering points far from their neighborhoods. ScaleMAP takes a different approach: each pairwise embedding displacement is divided by the geometric mean of the two endpoints' original-space local radii, re-injecting scale information as a change of variables rather than as a competing objective. Across standard benchmarks and scientific datasets from transcriptomics, hyperspectral imaging, and flow cytometry, ScaleMAP matches DensMAP on density preservation while maintaining UMAP-level neighborhood preservation. In transcriptomic data, it recovers sparse bridges between cell populations that UMAP collapses; in flow cytometry, it faithfully represents density structure across 17 orders of magnitude. The same principle applied to PaCMAP yields consistently improved density preservation, suggesting the approach generalizes beyond UMAP.