Variance Reduction for Expectations with Diffusion Teachers

Jesse Bettencourt, Xindi Wu, Matan Atzmon, James Lucas, Jonathan Lorraine

arXiv:2605.2148988.2

Predicted impact top 9% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners using diffusion models as frozen teachers in downstream pipelines, CARV reduces computational cost by reusing expensive upstream computations across noise resamples.

The paper introduces CARV, a compute-aware variance reduction framework for Monte Carlo expectations over diffusion teacher gradients. It achieves 2-3x effective compute multipliers in text-to-3D and attribution tasks, and reduces gradient variance by an order of magnitude in single-step distillation without improving FID.

Pretrained diffusion models serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data attribution. The teacher gradients these pipelines consume are Monte Carlo (MC) expectations over noise levels and Gaussian noise samples; their estimator variance dominates compute cost because each draw requires expensive upstream work (rendering, simulation, encoding). We introduce CARV, a compute-aware variance-accounting framework that motivates a hierarchical MC estimator: amortize the expensive upstream computation over cheap diffusion-noise resamples, sharpened by timestep importance sampling and a stratified-inverse-CDF construction. In our text-to-3D distillation and attribution experiments, CARV delivers 2-3x effective compute multipliers (most from amortized reuse; ~25% additional from IS+stratification) without changing the objective; in single-step distillation, the same techniques cut gradient variance by an order of magnitude but do not improve downstream FID, marking the regime where MC variance is no longer the bottleneck.

View on arXiv PDF

Similar