CVNov 25, 2024

VisualLens: Personalization through Task-Agnostic Visual History

arXiv:2411.16034v2h-index: 18
Originality Incremental advance
AI Analysis

This addresses the problem of inaccessible or non-generalizable user histories for multimodal recommendation systems, though it is incremental as it builds on existing MLLM capabilities.

The paper tackles the problem of personalizing recommendations without relying on user interaction logs by using task-gnostic visual histories, and shows that VisualLens improves over state-of-the-art item-based multimodal recommendations by 5-10% on Hit@3 and outperforms GPT-4o by 2-5%.

Existing recommendation systems either rely on user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. However, item-based histories are not always accessible, and are not generalizable for multimodal recommendation. We hypothesize that a user's visual history -- comprising images from daily life -- can offer rich, task-agnostic insights into their interests and preferences, and thus be leveraged for effective personalization. To this end, we propose VisualLens, a novel framework that leverages multimodal large language models (MLLMs) to enable personalization using task-agnostic visual history. VisualLens extracts, filters, and refines a spectrum user profile from the visual history to support personalized recommendation. We created two new benchmarks, Google-Review-V and Yelp-V, with task-agnostic visual histories, and show that VisualLens improves over state-of-the-art item-based multimodal recommendations by 5-10% on Hit@3, and outperforms GPT-4o by 2-5%. Further analysis shows that VisualLens is robust across varying history lengths and excels at adapting to both longer histories and unseen content categories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes