Enhancing Quantum Diffusion Models for Complex Image Generation
This work addresses mode collapse and generative limitations in quantum diffusion models for complex image generation, offering a feasible pathway for the NISQ era, though it is incremental as it builds on existing hybrid quantum-classical approaches.
The authors tackled the challenge of scalability and expressibility in quantum generative models for multi-modal distributions by proposing a Hybrid Quantum-Classical U-Net with Adaptive Non-Local Observables, achieving structurally coherent and recognizable image generation on the full MNIST dataset for all digit classes.
Quantum generative models offer a novel approach to exploring high-dimensional Hilbert spaces but face significant challenges in scalability and expressibility when applied to multi-modal distributions. In this study, we explore a Hybrid Quantum-Classical U-Net architecture integrated with Adaptive Non-Local Observables (ANO) as a potential solution to these hurdles. By compressing classical data into a dense quantum latent space and utilizing trainable observables, our model aims to extract non-local features that complement classical processing. We also investigate the role of Skip Connections in preserving semantic information during the reverse diffusion process. Experimental results on the full MNIST dataset (digits 0-9) demonstrate that the proposed architecture is capable of generating structurally coherent and recognizable images for all digit classes. While hardware constraints still impose limitations on resolution, our findings suggest that hybrid architectures with adaptive measurements provide a feasible pathway for mitigating mode collapse and enhancing generative capabilities in the NISQ era.