IRAIMMOct 18, 2024

Personalized Image Generation with Large Multimodal Models

arXiv:2410.14170v213 citationsh-index: 19WWW
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating personalized images for users, which is an incremental advancement in a domain-specific area with limited prior research.

The paper tackles the problem of personalized image generation by proposing the Pigeon framework, which uses large multimodal models and a two-stage preference alignment scheme to capture user preferences from noisy data, achieving superior results in tasks like sticker and movie poster generation as shown by quantitative and human evaluations.

Personalized content filtering, such as recommender systems, has become a critical infrastructure to alleviate information overload. However, these systems merely filter existing content and are constrained by its limited diversity, making it difficult to meet users' varied content needs. To address this limitation, personalized content generation has emerged as a promising direction with broad applications. Nevertheless, most existing research focuses on personalized text generation, with relatively little attention given to personalized image generation. The limited work in personalized image generation faces challenges in accurately capturing users' visual preferences and needs from noisy user-interacted images and complex multimodal instructions. Worse still, there is a lack of supervised data for training personalized image generation models. To overcome the challenges, we propose a Personalized Image Generation Framework named Pigeon, which adopts exceptional large multimodal models with three dedicated modules to capture users' visual preferences and needs from noisy user history and multimodal instructions. To alleviate the data scarcity, we introduce a two-stage preference alignment scheme, comprising masked preference reconstruction and pairwise preference alignment, to align Pigeon with the personalized image generation task. We apply Pigeon to personalized sticker and movie poster generation, where extensive quantitative results and human evaluation highlight its superiority over various generative baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes