Animalbooth: multimodal feature enhancement for animal subject personalization
This work addresses the problem of generating personalized animal images for researchers and practitioners in AI/computer vision, representing an incremental advance with domain-specific applications.
The paper tackles the challenge of personalized animal image generation by addressing feature misalignment and identity drift, introducing AnimalBooth with modules for identity preservation and frequency-controlled feature integration, which outperforms baselines on benchmarks with improved identity fidelity and perceptual quality.
Personalized animal image generation is challenging due to rich appearance cues and large morphological variability. Existing approaches often exhibit feature misalignment across domains, which leads to identity drift. We present AnimalBooth, a framework that strengthens identity preservation with an Animal Net and an adaptive attention module, mitigating cross domain alignment errors. We further introduce a frequency controlled feature integration module that applies Discrete Cosine Transform filtering in the latent space to guide the diffusion process, enabling a coarse to fine progression from global structure to detailed texture. To advance research in this area, we curate AnimalBench, a high resolution dataset for animal personalization. Extensive experiments show that AnimalBooth consistently outperforms strong baselines on multiple benchmarks and improves both identity fidelity and perceptual quality.