CVFeb 20

Towards LLM-centric Affective Visual Customization via Efficient and Precise Emotion Manipulating

arXiv:2602.18016v11 citations
Originality Incremental advance
AI Analysis

This work addresses the lack of general-purpose foundation models for affective visual customization, offering a novel method for emotion manipulation in images, though it appears incremental as it builds on existing multimodal LLM capabilities.

The paper tackles the problem of visual customization by focusing on subjective emotional content, proposing an LLM-centric approach for affective visual customization that efficiently aligns emotion conversion and retains emotion-agnostic contents, achieving superior performance over state-of-the-art baselines on a constructed dataset.

Previous studies on visual customization primarily rely on the objective alignment between various control signals (e.g., language, layout and canny) and the edited images, which largely ignore the subjective emotional contents, and more importantly lack general-purpose foundation models for affective visual customization. With this in mind, this paper proposes an LLM-centric Affective Visual Customization (L-AVC) task, which focuses on generating images within modifying their subjective emotions via Multimodal LLM. Further, this paper contends that how to make the model efficiently align emotion conversion in semantics (named inter-emotion semantic conversion) and how to precisely retain emotion-agnostic contents (named exter-emotion semantic retaining) are rather important and challenging in this L-AVC task. To this end, this paper proposes an Efficient and Precise Emotion Manipulating approach for editing subjective emotions in images. Specifically, an Efficient Inter-emotion Converting (EIC) module is tailored to make the LLM efficiently align emotion conversion in semantics before and after editing, followed by a Precise Exter-emotion Retaining (PER) module to precisely retain the emotion-agnostic contents. Comprehensive experimental evaluations on our constructed L-AVC dataset demonstrate the great advantage of the proposed EPEM approach to the L-AVC task over several state-of-the-art baselines. This justifies the importance of emotion information for L-AVC and the effectiveness of EPEM in efficiently and precisely manipulating such information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes