SerialGen: Personalized Image Generation by First Standardization Then Personalization
This work addresses the challenge of maintaining appearance consistency in personalized image generation for applications like digital avatars or content creation, though it appears incremental as it builds on existing personalization methods.
The paper tackled the problem of generating personalized human character images with high text controllability and whole-body appearance consistency by proposing SerialGen, a two-stage framework that first standardizes reference images and then personalizes generation, resulting in improved appearance consistency and accurate response to diverse text prompts.
In this work, we are interested in achieving both high text controllability and whole-body appearance consistency in the generation of personalized human characters. We propose a novel framework, named SerialGen, which is a serial generation method consisting of two stages: first, a standardization stage that standardizes reference images, and then a personalized generation stage based on the standardized reference. Furthermore, we introduce two modules aimed at enhancing the standardization process. Our experimental results validate the proposed framework's ability to produce personalized images that faithfully recover the reference image's whole-body appearance while accurately responding to a wide range of text prompts. Through thorough analysis, we highlight the critical contribution of the proposed serial generation method and standardization model, evidencing enhancements in appearance consistency between reference and output images and across serial outputs generated from diverse text prompts. The term "Serial" in this work carries a double meaning: it refers to the two-stage method and also underlines our ability to generate serial images with consistent appearance throughout.