SeqLoRA: Bilevel Orthogonal Adaptation for Continual Multi-Concept Generation
For practitioners of personalized text-to-image generation, SeqLoRA addresses the trade-off between expressiveness and concept fidelity in composing multiple custom concepts.
SeqLoRA introduces a bilevel optimization framework for continual multi-concept generation in diffusion models, enabling joint learning of LoRA factors to reduce representation interference. It achieves improved identity preservation and scalability across up to 101 concepts without costly fusion.
Parameter-efficient fine-tuning enables fast personalization of text-to-image diffusion models, but composing multiple custom concepts remains challenging due to representation interference. Existing modular methods either rely on expensive post-hoc fusion or freeze adaptation subspaces, which limit expressiveness and concept fidelity. To address this trade-off, we propose Sequential regularized LoRA (SeqLoRA), a constrained continual learning framework that jointly optimizes both LoRA factors via bilevel optimization. Theoretically, we establish strong convergence guarantees for our algorithm and model the residual layer activations as a matrix sub-Gaussian process to derive high-probability bounds on catastrophic forgetting. We further prove that learning the LoRA basis from data minimizes residual interference energy more effectively than frozen-basis methods. Experiments on multi-concept image generation demonstrate that SeqLoRA improves identity preservation and scalability across up to 101 concepts, while avoiding costly fusion and reducing attribute interference in composed generations.