CL AIOct 19, 2025

MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning

arXiv:2510.16797v1h-index: 3

Originality Incremental advance

AI Analysis

This work addresses domain adaptation for sentence embedding models, which is an incremental improvement for applications in specialized domains.

The paper tackles the problem of adapting large-scale general-domain sentence embedding models to specialized domains by introducing MOSAIC, a multi-stage framework that jointly optimizes masked language modeling and contrastive objectives, achieving improvements up to 13.4% in NDCG@10 over baselines.

We introduce MOSAIC (Masked Objective with Selective Adaptation for In-domain Contrastive learning), a multi-stage framework for domain adaptation of sentence embedding models that incorporates joint domain-specific masked supervision. Our approach addresses the challenges of adapting large-scale general-domain sentence embedding models to specialized domains. By jointly optimizing masked language modeling (MLM) and contrastive objectives within a unified training pipeline, our method enables effective learning of domain-relevant representations while preserving the robust semantic discrimination properties of the original model. We empirically validate our approach on both high-resource and low-resource domains, achieving improvements up to 13.4% in NDCG@10 (Normalized Discounted Cumulative Gain) over strong general-domain baselines. Comprehensive ablation studies further demonstrate the effectiveness of each component, highlighting the importance of balanced joint supervision and staged adaptation.

View on arXiv PDF

Similar