IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
This addresses the challenge of practical usage in compositional image editing by improving identity preservation, which is incremental as it builds on existing diffusion-based methods with a novel framework.
The paper tackles the problem of object identity preservation in generative object compositing for image editing, introducing IMPRINT, a diffusion-based model with a two-stage learning framework that decouples identity preservation from compositing, resulting in significant outperformance over existing methods in identity preservation and composition quality.
Generative object compositing emerges as a promising new avenue for compositional image editing. However, the requirement of object identity preservation poses a significant challenge, limiting practical usage of most existing methods. In response, this paper introduces IMPRINT, a novel diffusion-based generative model trained with a two-stage learning framework that decouples learning of identity preservation from that of compositing. The first stage is targeted for context-agnostic, identity-preserving pretraining of the object encoder, enabling the encoder to learn an embedding that is both view-invariant and conducive to enhanced detail preservation. The subsequent stage leverages this representation to learn seamless harmonization of the object composited to the background. In addition, IMPRINT incorporates a shape-guidance mechanism offering user-directed control over the compositing process. Extensive experiments demonstrate that IMPRINT significantly outperforms existing methods and various baselines on identity preservation and composition quality.