GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation
This addresses the challenge of improving recommendation accuracy for long-tail items and user modality preferences in multimodal systems, representing an incremental advancement over prior methods.
The paper tackles the problem of long-tail items with limited interaction data and simplistic user modality representations in multimodal recommendation systems by proposing GUME, which enhances user-item graphs and user modalities, achieving state-of-the-art performance on four datasets with concrete improvements in metrics like Recall@20 and NDCG@20.
Multimodal recommendation systems (MMRS) have received considerable attention from the research community due to their ability to jointly utilize information from user behavior and product images and text. Previous research has two main issues. First, many long-tail items in recommendation systems have limited interaction data, making it difficult to learn comprehensive and informative representations. However, past MMRS studies have overlooked this issue. Secondly, users' modality preferences are crucial to their behavior. However, previous research has primarily focused on learning item modality representations, while user modality representations have remained relatively simplistic.To address these challenges, we propose a novel Graphs and User Modalities Enhancement (GUME) for long-tail multimodal recommendation. Specifically, we first enhance the user-item graph using multimodal similarity between items. This improves the connectivity of long-tail items and helps them learn high-quality representations through graph propagation. Then, we construct two types of user modalities: explicit interaction features and extended interest features. By using the user modality enhancement strategy to maximize mutual information between these two features, we improve the generalization ability of user modality representations. Additionally, we design an alignment strategy for modality data to remove noise from both internal and external perspectives. Extensive experiments on four publicly available datasets demonstrate the effectiveness of our approach.