Forest-Guided Semantic Transport for Label-Supervised Manifold Alignment
This addresses the challenge of aligning multimodal datasets with shared labels for applications like batch correction and biological conservation, representing an incremental improvement over existing methods.
The paper tackles the problem of noisy and semantically misleading structure in label-supervised manifold alignment by introducing FoSTA, which uses forest-induced geometry to denoise intra-domain structure and recover task-relevant manifolds, resulting in improved correspondence recovery and label transfer on benchmarks and strong performance in single-cell applications.
Label-supervised manifold alignment bridges the gap between unsupervised and correspondence-based paradigms by leveraging shared label information to align multimodal datasets. Still, most existing methods rely on Euclidean geometry to model intra-domain relationships. This approach can fail when features are only weakly related to the task of interest, leading to noisy, semantically misleading structure and degraded alignment quality. To address this limitation, we introduce FoSTA (Forest-guided Semantic Transport Alignment), a scalable alignment framework that leverages forest-induced geometry to denoise intra-domain structure and recover task-relevant manifolds prior to alignment. FoSTA builds semantic representations directly from label-informed forest affinities and aligns them via fast, hierarchical semantic transport, capturing meaningful cross-domain relationships. Extensive comparisons with established baselines demonstrate that FoSTA improves correspondence recovery and label transfer on synthetic benchmarks and delivers strong performance in practical single-cell applications, including batch correction and biological conservation.