GimmBO: Interactive Generative Image Model Merging via Bayesian Optimization
This addresses the challenge for users of generative image models who need to efficiently merge multiple adapters, though it is incremental as it builds on existing optimization methods.
The paper tackled the problem of exploring the vast design space of merging adapters for diffusion-based image generation, which is currently done manually and scales poorly, by proposing GimmBO, an interactive system using Preferential Bayesian Optimization that improved convergence and success rates in evaluations.
Fine-tuning-based adaptation is widely used to customize diffusion-based image generation, leading to large collections of community-created adapters that capture diverse subjects and styles. Adapters derived from the same base model can be merged with weights, enabling the synthesis of new visual results within a vast and continuous design space. To explore this space, current workflows rely on manual slider-based tuning, an approach that scales poorly and makes weight selection difficult, even when the candidate set is limited to 20-30 adapters. We propose GimmBO to support interactive exploration of adapter merging for image generation through Preferential Bayesian Optimization (PBO). Motivated by observations from real-world usage, including sparsity and constrained weight ranges, we introduce a two-stage BO backend that improves sampling efficiency and convergence in high-dimensional spaces. We evaluate our approach with simulated users and a user study, demonstrating improved convergence, high success rates, and consistent gains over BO and line-search baselines, and further show the flexibility of the framework through several extensions.