IRAILGJun 25, 2023

Mining Stable Preferences: Adaptive Modality Decorrelation for Multimedia Recommendation

arXiv:2306.14179v113 citationsh-index: 91
Originality Incremental advance
AI Analysis

This work addresses the challenge of robust multimedia recommendation for users in dynamic web environments, representing an incremental improvement through a plug-and-play decorrelation module.

The paper tackles the problem of learning stable user preferences in multimedia recommendation by addressing spurious correlations between modalities that lead to performance drops under distribution shifts, and proposes MODEST, a decorrelation module that improves recommendation performance across four datasets and backbones.

Multimedia content is of predominance in the modern Web era. In real scenarios, multiple modalities reveal different aspects of item attributes and usually possess different importance to user purchase decisions. However, it is difficult for models to figure out users' true preference towards different modalities since there exists strong statistical correlation between modalities. Even worse, the strong statistical correlation might mislead models to learn the spurious preference towards inconsequential modalities. As a result, when data (modal features) distribution shifts, the learned spurious preference might not guarantee to be as effective on the inference set as on the training set. We propose a novel MOdality DEcorrelating STable learning framework, MODEST for brevity, to learn users' stable preference. Inspired by sample re-weighting techniques, the proposed method aims to estimate a weight for each item, such that the features from different modalities in the weighted distribution are decorrelated. We adopt Hilbert Schmidt Independence Criterion (HSIC) as independence testing measure which is a kernel-based method capable of evaluating the correlation degree between two multi-dimensional and non-linear variables. Our method could be served as a play-and-plug module for existing multimedia recommendation backbones. Extensive experiments on four public datasets and four state-of-the-art multimedia recommendation backbones unequivocally show that our proposed method can improve the performances by a large margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes