IR AINov 28, 2025

Distillation-based Scenario-Adaptive Mixture-of-Experts for the Matching Stage of Multi-scenario Recommendation

Ruibing Wang, Shuhan Guo, Haotong Du, Quanming Yao

arXiv:2601.02368v12.3

Originality Incremental advance

AI Analysis

This addresses the challenge of optimizing user experience across diverse contexts in recommendation systems, but it is incremental as it builds on existing MMOE methods.

The paper tackled the problem of adapting Multi-gate Mixture-of-Experts (MMOE) to the matching stage in multi-scenario recommendation, where independent two-tower architectures and head scenario dominance cause bottlenecks, and proposed DSMOE with a Scenario-Adaptive Projection module and cross-architecture knowledge distillation, resulting in significant improvements in retrieval quality for data-sparse scenarios as shown in experiments on real-world datasets.

Multi-scenario recommendation is pivotal for optimizing user experience across diverse contexts. While Multi-gate Mixture-of-Experts (MMOE) thrives in ranking, its transfer to the matching stage is hindered by the blind optimization inherent to independent two-tower architectures and the parameter dominance of head scenarios. To address these structural and distributional bottlenecks, we propose Distillation-based Scenario-Adaptive Mixture-of-Experts (DSMOE). Specially, we devise a Scenario-Adaptive Projection (SAP) module to generate lightweight, context-specific parameters, effectively preventing expert collapse in long-tail scenarios. Concurrently, we introduce a cross-architecture knowledge distillation framework, where an interaction-aware teacher guides the two-tower student to capture complex matching patterns. Extensive experiments on real-world datasets demonstrate DSMOE's superiority, particularly in significantly improving retrieval quality for under-represented, data-sparse scenarios.

View on arXiv PDF

Similar