IRMar 27

Towards Transfer-Efficient Multi-modal Sequential Recommendation with State Space Duality

arXiv:2506.0291611.5h-index: 10Has Code
Predicted impact top 80% in IR · last 90 daysOriginality Incremental advance
AI Analysis

This addresses efficiency and accuracy issues in multi-modal recommendation systems, though it appears incremental as it builds on existing transfer learning approaches.

The paper tackles the problem of slow fine-tuning convergence in transferable multi-modal sequential recommendation models by proposing MMM4Rec, which achieves state-of-the-art performance with 10x faster average convergence speed when transferring to large-scale datasets.

Sequential Recommendation (SR) models infer user preferences from interaction histories. While transferable Multi-modal SR models outperform traditional ID-based approaches, existing methods struggle with slow fine-tuning convergence due to complex optimization requirements and negative transfer effects. We propose MMM4Rec (Multi-Modal Mamba for Sequential Recommendation), a novel Multi-modal SR framework that incorporates a dedicated algebraic constraint mechanism for efficient transfer learning. By combining State Space Duality (SSD)'s temporal decay properties with a globally-aware temporal modeling design, our model dynamically prioritizes key modality information, overcoming limitations of Transformer-based approaches. The framework implements a constrained two-stage process: (1) sequence-level cross-modal alignment via shared projection matrices, followed by (2) temporal fusion using our newly designed Cross-SSD module and dual-channel Fourier adaptive filtering. This architecture maintains semantic consistency while suppressing noise propagation. MMM4Rec achieves rapid fine-tuning convergence with simple cross-entropy loss, significantly improving Multi-modal recommendation accuracy while maintaining strong transferability. Extensive experiments demonstrate MMM4Rec's state-of-the-art performance, achieving strong multi-modal retrieval capability and exhibiting 10x faster average convergence speed when transferring to large-scale downstream datasets. The implementation is available at https://github.com/AlwaysFHao/MMM4Rec .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes