IRCLMay 14, 2025

A Survey on Large Language Models in Multimodal Recommender Systems

arXiv:2505.09777v110 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

It addresses the problem of improving multimodal recommendations for researchers and practitioners, but it is incremental as a survey rather than a novel method.

This survey reviews the integration of large language models (LLMs) into multimodal recommender systems (MRS) to enhance recommendation performance through semantic reasoning and flexibility, proposing a taxonomy and identifying techniques for future research.

Multimodal recommender systems (MRS) integrate heterogeneous user and item data, such as text, images, and structured information, to enhance recommendation performance. The emergence of large language models (LLMs) introduces new opportunities for MRS by enabling semantic reasoning, in-context learning, and dynamic input handling. Compared to earlier pre-trained language models (PLMs), LLMs offer greater flexibility and generalisation capabilities but also introduce challenges related to scalability and model accessibility. This survey presents a comprehensive review of recent work at the intersection of LLMs and MRS, focusing on prompting strategies, fine-tuning methods, and data adaptation techniques. We propose a novel taxonomy to characterise integration patterns, identify transferable techniques from related recommendation domains, provide an overview of evaluation metrics and datasets, and point to possible future directions. We aim to clarify the emerging role of LLMs in multimodal recommendation and support future research in this rapidly evolving field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes