CVMay 24, 2023

T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified Visual Modalities

arXiv:2305.14674v13.91 citations

Originality Incremental advance

AI Analysis

This addresses the scalability issue for researchers and practitioners in AI/ML working on unified visual content generation, representing an incremental improvement over existing DPF methods.

The paper tackled the problem of scaling Diffusion Probabilistic Fields (DPF) to high-resolution data across unified visual modalities like images and videos, which was limited by difficulty in capturing local structures. The result was a new model with view-wise sampling and guidance that scaled effectively, demonstrating potential as a foundation framework for scalable modality-unified generation.

Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces. While DPF shows great potential for unifying data generation of various modalities including images, videos, and 3D geometry, it does not scale to a higher data resolution. This can be attributed to the ``scaling property'', where it is difficult for the model to capture local structures through uniform sampling. To this end, we propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning, and incorporating additional guidance, e.g., text description, to complement the global geometry. The model can be scaled to generate high-resolution data while unifying multiple modalities. Experimental results on data generation in various modalities demonstrate the effectiveness of our model, as well as its potential as a foundation framework for scalable modality-unified visual content generation.

View on arXiv PDF

Similar