CVNov 25, 2024

VisualLens: Personalization through Task-Agnostic Visual History

Wang Bill Zhu, Deqing Fu, Kai Sun, Yi Lu, Zhaojiang Lin, Seungwhan Moon, Kanika Narang, Mustafa Canim, Yue Liu, Anuj Kumar, Xin Luna Dong

arXiv:2411.16034v22.0h-index: 18

Originality Incremental advance

AI Analysis

This addresses the problem of inaccessible or non-generalizable user histories for multimodal recommendation systems, though it is incremental as it builds on existing MLLM capabilities.

The paper tackles the problem of personalizing recommendations without relying on user interaction logs by using task-gnostic visual histories, and shows that VisualLens improves over state-of-the-art item-based multimodal recommendations by 5-10% on Hit@3 and outperforms GPT-4o by 2-5%.

Existing recommendation systems either rely on user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. However, item-based histories are not always accessible, and are not generalizable for multimodal recommendation. We hypothesize that a user's visual history -- comprising images from daily life -- can offer rich, task-agnostic insights into their interests and preferences, and thus be leveraged for effective personalization. To this end, we propose VisualLens, a novel framework that leverages multimodal large language models (MLLMs) to enable personalization using task-agnostic visual history. VisualLens extracts, filters, and refines a spectrum user profile from the visual history to support personalized recommendation. We created two new benchmarks, Google-Review-V and Yelp-V, with task-agnostic visual histories, and show that VisualLens improves over state-of-the-art item-based multimodal recommendations by 5-10% on Hit@3, and outperforms GPT-4o by 2-5%. Further analysis shows that VisualLens is robust across varying history lengths and excels at adapting to both longer histories and unseen content categories.

View on arXiv PDF

Similar