Multi-modal Relational Item Representation Learning for Inferring Substitutable and Complementary Items
This addresses the challenge of noisy user behavior data and data sparsity in item recommendation, particularly for cold-start items, but is incremental as it builds on existing multi-modal and self-supervised approaches.
The paper tackles the problem of inferring substitutable and complementary items by proposing a self-supervised multi-modal relational item representation learning framework, achieving performance improvements of 26.1% for substitutable recommendation and 39.2% for complementary recommendation over baselines.
We introduce a novel self-supervised multi-modal relational item representation learning framework designed to infer substitutable and complementary items. Existing approaches primarily focus on modeling item-item associations deduced from user behaviors using graph neural networks (GNNs) or leveraging item content information. However, these methods often overlook critical challenges, such as noisy user behavior data and data sparsity due to the long-tailed distribution of these behaviors. In this paper, we propose MMSC, a self-supervised multi-modal relational item representation learning framework to address these challenges. Specifically, MMSC consists of three main components: (1) a multi-modal item representation learning module that leverages a multi-modal foundational model and learns from item metadata, (2) a self-supervised behavior-based representation learning module that denoises and learns from user behavior data, and (3) a hierarchical representation aggregation mechanism that integrates item representations at both the semantic and task levels. Additionally, we leverage LLMs to generate augmented training data, further enhancing the denoising process during training. We conduct extensive experiments on five real-world datasets, showing that MMSC outperforms existing baselines by 26.1% for substitutable recommendation and 39.2% for complementary recommendation. In addition, we empirically show that MMSC is effective in modeling cold-start items.