CVAIJun 10, 2025

Product of Experts for Visual Generation

arXiv:2506.08894v26 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the challenge of combining heterogeneous models for visual generation, offering a training-free solution that enhances controllability, though it appears incremental as it builds on existing methods like Annealed Importance Sampling.

The paper tackles the problem of integrating diverse knowledge from multiple sources for visual generation by proposing a Product of Experts framework that performs inference-time knowledge composition, resulting in better controllability and flexible user interfaces for image and video synthesis tasks.

Modern neural models capture rich priors and have complementary knowledge over shared data domains, e.g., images and videos. Integrating diverse knowledge from multiple sources -- including visual generative models, visual language models, and sources with human-crafted knowledge such as graphics engines and physics simulators -- remains under-explored. We propose a Product of Experts (PoE) framework that performs inference-time knowledge composition from heterogeneous models. This training-free approach samples from the product distribution across experts via Annealed Importance Sampling (AIS). Our framework shows practical benefits in image and video synthesis tasks, yielding better controllability than monolithic methods and additionally providing flexible user interfaces for specifying visual generation goals.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes