Anchored Alignment: Preventing Positional Collapse in Multimodal Recommender Systems
This work addresses a specific issue in multimodal recommender systems for improving recommendation accuracy and expressiveness, representing an incremental advancement.
The paper tackles the problem of positional collapse in multimodal recommender systems by proposing AnchorRec, a framework that uses anchor-based alignment to preserve modality-specific structures while maintaining cross-modal consistency, achieving competitive top-N recommendation accuracy on four Amazon datasets.
Multimodal recommender systems (MMRS) leverage images, text, and interaction signals to enrich item representations. However, recent alignment based MMRSs that enforce a unified embedding space often blur modality specific structures and exacerbate ID dominance. Therefore, we propose AnchorRec, a multimodal recommendation framework that performs indirect, anchor based alignment in a lightweight projection domain. By decoupling alignment from representation learning, AnchorRec preserves each modality's native structure while maintaining cross modal consistency and avoiding positional collapse. Experiments on four Amazon datasets show that AnchorRec achieves competitive top N recommendation accuracy, while qualitative analyses demonstrate improved multimodal expressiveness and coherence. The codebase of AnchorRec is available at https://github.com/hun9008/AnchorRec.