Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization
This addresses the issue of information leakage and incoherent synthesis in visual concept personalization for AI and computer vision applications, representing a novel method rather than an incremental improvement.
The paper tackles the problem of visual concept personalization by introducing Omni-Attribute, an open-vocabulary image attribute encoder that learns high-fidelity, attribute-specific representations to isolate single attributes like identity or style, achieving state-of-the-art performance across multiple benchmarks.
Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However, existing methods rely on holistic embeddings from general-purpose image encoders, which entangle multiple visual factors and make it difficult to isolate a single attribute. This often leads to information leakage and incoherent synthesis. To address this limitation, we introduce Omni-Attribute, the first open-vocabulary image attribute encoder designed to learn high-fidelity, attribute-specific representations. Our approach jointly designs the data and model: (i) we curate semantically linked image pairs annotated with positive and negative attributes to explicitly teach the encoder what to preserve or suppress; and (ii) we adopt a dual-objective training paradigm that balances generative fidelity with contrastive disentanglement. The resulting embeddings prove effective for open-vocabulary attribute retrieval, personalization, and compositional generation, achieving state-of-the-art performance across multiple benchmarks.