Zeeshan Patel

15.8CVNov 12, 2024Code

Scaling Properties of Diffusion Models for Perceptual Tasks

Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran et al.

In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute. To access our code and models, see https://scaling-diffusion-perception.github.io .

12.5LGDec 15, 2024

Exploring Diffusion and Flow Matching Under Generator Matching

Zeeshan Patel, James DeLoye, Lance Mathias

In this paper, we present a comprehensive theoretical comparison of diffusion and flow matching under the Generator Matching framework. Despite their apparent differences, both diffusion and flow matching can be viewed under the unified framework of Generator Matching. By recasting both diffusion and flow matching under the same generative Markov framework, we provide theoretical insights into why flow matching models can be more robust empirically and how novel model classes can be constructed by mixing deterministic and stochastic components. Our analysis offers a fresh perspective on the relationships between state-of-the-art generative modeling paradigms.

Zeeshan Patel

2 Papers