CVApr 21, 2025

Cognitive-Inspired Hierarchical Attention Fusion With Visual and Textual for Cross-Domain Sequential Recommendation

Wangyu Wu, Zhenhong Chen, Siqi Song, Xianglin Qiu, Xiaowei Huang, Fei Ma, Jimin Xiao

arXiv:2504.15085v58.44 citationsh-index: 10CogSci

Originality Incremental advance

AI Analysis

This addresses the problem of predicting user behavior across multiple domains for e-commerce applications, but it is incremental as it builds on existing multimodal and attention-based approaches.

The paper tackled cross-domain sequential recommendation by proposing HAF-VT, a method integrating visual and textual data with hierarchical attention, and it outperformed existing methods on four e-commerce datasets.

Cross-Domain Sequential Recommendation (CDSR) predicts user behavior by leveraging historical interactions across multiple domains, focusing on modeling cross-domain preferences through intra- and inter-sequence item relationships. Inspired by human cognitive processes, we propose Hierarchical Attention Fusion of Visual and Textual Representations (HAF-VT), a novel approach integrating visual and textual data to enhance cognitive modeling. Using the frozen CLIP model, we generate image and text embeddings, enriching item representations with multimodal data. A hierarchical attention mechanism jointly learns single-domain and cross-domain preferences, mimicking human information integration. Evaluated on four e-commerce datasets, HAF-VT outperforms existing methods in capturing cross-domain user interests, bridging cognitive principles with computational models and highlighting the role of multimodal data in sequential decision-making.

View on arXiv PDF

Similar