Multi-objects Generation with Amortized Structural Regularization
This work addresses the challenge of generating realistic multi-object images for applications in computer vision and AI, representing an incremental advancement by integrating human knowledge into existing models.
The paper tackles the problem of deep generative models failing to capture structures in multi-object images by proposing the amortized structural regularization framework, which embeds human knowledge via structural constraints, resulting in significant improvements in inference accuracy and sample quality over baselines.
Deep generative models (DGMs) have shown promise in image generation. However, most of the existing work learn the model by simply optimizing a divergence between the marginal distributions of the model and the data, and often fail to capture the rich structures and relations in multi-object images. Human knowledge is a critical element to the success of DGMs to infer these structures. In this paper, we propose the amortized structural regularization (ASR) framework, which adopts the posterior regularization (PR) to embed human knowledge into DGMs via a set of structural constraints. We derive a lower bound of the regularized log-likelihood, which can be jointly optimized with respect to the generative model and recognition model efficiently. Empirical results show that ASR significantly outperforms the DGM baselines in terms of inference accuracy and sample quality.