LGIRNov 24, 2025

From Raw Features to Effective Embeddings: A Three-Stage Approach for Multimodal Recipe Recommendation

arXiv:2511.19176v2
Originality Incremental advance
AI Analysis

This work addresses recipe recommendation for web-based food platforms, offering an incremental improvement through systematic enhancement of multimodal signals.

The paper tackles the challenge of leveraging multimodal features for recipe recommendation by proposing TESMR, a three-stage framework that refines raw features into embeddings, resulting in 7-15% higher Recall@10 compared to existing methods.

Recipe recommendation has become an essential task in web-based food platforms. A central challenge is effectively leveraging rich multimodal features beyond user-recipe interactions. Our analysis shows that even simple uses of multimodal signals yield competitive performance, suggesting that systematic enhancement of these signals is highly promising. We propose TESMR, a 3-stage framework for recipe recommendation that progressively refines raw multimodal features into effective embeddings through: (1) content-based enhancement using foundation models with multimodal comprehension, (2) relation-based enhancement via message propagation over user-recipe interactions, and (3) learning-based enhancement through contrastive learning with learnable embeddings. Experiments on two real-world datasets show that TESMR outperforms existing methods, achieving 7-15% higher Recall@10.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes