Tell Me What Is Good About This Property: Leveraging Reviews For Segment-Personalized Image Collection Summarization
This work addresses the need for user-aligned visual summaries in web content, specifically for property listings at Booking.com, representing an incremental improvement by incorporating review data without costly annotations.
The paper tackles the problem of personalized image collection summarization for property listings by leveraging user reviews to identify important aspects, resulting in enhanced visual summaries that outperform non-personalized and image-based baselines in human perceptual studies.
Image collection summarization techniques aim to present a compact representation of an image gallery through a carefully selected subset of images that captures its semantic content. When it comes to web content, however, the ideal selection can vary based on the user's specific intentions and preferences. This is particularly relevant at Booking.com, where presenting properties and their visual summaries that align with users' expectations is crucial. To address this challenge, we consider user intentions in the summarization of property visuals by analyzing property reviews and extracting the most significant aspects mentioned by users. By incorporating the insights from reviews in our visual summaries, we enhance the summaries by presenting the relevant content to a user. Moreover, we achieve it without the need for costly annotations. Our experiments, including human perceptual studies, demonstrate the superiority of our cross-modal approach, which we coin as CrossSummarizer over the no-personalization and image-based clustering baselines.