IRAIOct 17, 2025

MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation

arXiv:2510.15286v13 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the problem of scalable and transferable recommendation modeling for large-scale industrial applications, representing an incremental improvement with novel integration of existing techniques.

The paper tackles the challenge of manual feature engineering and scenario-specific architectures in industrial recommender systems by proposing MTmixAtt, a unified Mixture-of-Experts architecture with Multi-Mix Attention, which achieves a +3.62% increase in Payment PV and +2.54% increase in Actual Payment GTV in online A/B tests.

Industrial recommender systems critically depend on high-quality ranking models. However, traditional pipelines still rely on manual feature engineering and scenario-specific architectures, which hinder cross-scenario transfer and large-scale deployment. To address these challenges, we propose \textbf{MTmixAtt}, a unified Mixture-of-Experts (MoE) architecture with Multi-Mix Attention, designed for large-scale recommendation tasks. MTmixAtt integrates two key components. The \textbf{AutoToken} module automatically clusters heterogeneous features into semantically coherent tokens, removing the need for human-defined feature groups. The \textbf{MTmixAttBlock} module enables efficient token interaction via a learnable mixing matrix, shared dense experts, and scenario-aware sparse experts, capturing both global patterns and scenario-specific behaviors within a single framework. Extensive experiments on the industrial TRec dataset from Meituan demonstrate that MTmixAtt consistently outperforms state-of-the-art baselines including Transformer-based models, WuKong, HiFormer, MLP-Mixer, and RankMixer. At comparable parameter scales, MTmixAtt achieves superior CTR and CTCVR metrics; scaling to MTmixAtt-1B yields further monotonic gains. Large-scale online A/B tests validate the real-world impact: in the \textit{Homepage} scenario, MTmixAtt increases Payment PV by \textbf{+3.62\%} and Actual Payment GTV by \textbf{+2.54\%}. Overall, MTmixAtt provides a unified and scalable solution for modeling arbitrary heterogeneous features across scenarios, significantly improving both user experience and commercial outcomes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes