IRAIFeb 16, 2024

UMAIR-FPS: User-aware Multi-modal Animation Illustration Recommendation Fusion with Painting Style

arXiv:2402.10381v21 citationsh-index: 2DASFAA
AI Analysis

This work addresses the problem of anime illustration recommendation for users by improving multi-modal fusion, though it is incremental as it builds on existing recommendation systems with novel feature integration.

The paper tackles the challenge of recommending anime illustrations by integrating image painting style features with semantic features and dynamically weighting multi-modal features based on user interactions, achieving substantial performance enhancements over state-of-the-art baselines on large real-world datasets.

The rapid advancement of high-quality image generation models based on AI has generated a deluge of anime illustrations. Recommending illustrations to users within massive data has become a challenging and popular task. However, existing anime recommendation systems have focused on text features but still need to integrate image features. In addition, most multi-modal recommendation research is constrained by tightly coupled datasets, limiting its applicability to anime illustrations. We propose the User-aware Multi-modal Animation Illustration Recommendation Fusion with Painting Style (UMAIR-FPS) to tackle these gaps. In the feature extract phase, for image features, we are the first to combine image painting style features with semantic features to construct a dual-output image encoder for enhancing representation. For text features, we obtain text embeddings based on fine-tuning Sentence-Transformers by incorporating domain knowledge that composes a variety of domain text pairs from multilingual mappings, entity relationships, and term explanation perspectives, respectively. In the multi-modal fusion phase, we novelly propose a user-aware multi-modal contribution measurement mechanism to weight multi-modal features dynamically according to user features at the interaction level and employ the DCN-V2 module to model bounded-degree multi-modal crosses effectively. UMAIR-FPS surpasses the stat-of-the-art baselines on large real-world datasets, demonstrating substantial performance enhancements.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes