IRMay 21

Behavior-Guided Candidate Calibration for Multimodal Recommendation

arXiv:2605.2207383.9Has Code

AI Analysis

For practitioners of multimodal recommendation, this work offers a method to improve ranking by leveraging behavior signals without destabilizing the representation space.

The paper identifies that moderate cross-view agreement in multimodal recommendation is beneficial, while strong agreement suppresses discriminative signals. It introduces a behavior-guided candidate calibration model that applies signed candidate evidence only to the shortlist, achieving consistent gains over strong baselines on Amazon datasets.

Multimodal recommendation benefits from content signals, but the gain depends on how those signals interact with the ranking pipeline. We find that moderate cross-view agreement helps, while stronger agreement suppresses recommendation-specific variation. Spectral analysis shows a clear split: low-frequency components capture shared structure, and higher-frequency components preserve more discriminative signal. Based on this finding, we introduce a behavior-guided candidate calibration model that converts training-only co-user overlap into signed candidate evidence and applies it only to the shortlist produced by the multimodal backbone. The backbone keeps the representation space stable; behavior evidence acts only where ranking is decided. Results on Amazon Baby, Sports, and Electronics show consistent gains over strong multimodal baselines. Code is available at https://github.com/LIZESHENG13/bridge.

View on arXiv PDF Code

Similar