LG AIOct 28, 2025

LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies

arXiv:2510.24983v1

Originality Highly original

AI Analysis

This provides a principled, calibrated risk control method for diffusion policies in offline RL, addressing a specific bottleneck in safety and interpretability.

The paper tackles the problem of diffusion policies in offline reinforcement learning lacking statistical risk control by introducing LRT-Diffusion, a risk-aware sampling rule that calibrates guidance to meet a user-specified Type-I error level, improving the return-OOD trade-off on D4RL MuJoCo tasks.

Diffusion policies are competitive for offline reinforcement learning (RL) but are typically guided at sampling time by heuristics that lack a statistical notion of risk. We introduce LRT-Diffusion, a risk-aware sampling rule that treats each denoising step as a sequential hypothesis test between the unconditional prior and the state-conditional policy head. Concretely, we accumulate a log-likelihood ratio and gate the conditional mean with a logistic controller whose threshold tau is calibrated once under H0 to meet a user-specified Type-I level alpha. This turns guidance from a fixed push into an evidence-driven adjustment with a user-interpretable risk budget. Importantly, we deliberately leave training vanilla (two heads with standard epsilon-prediction) under the structure of DDPM. LRT guidance composes naturally with Q-gradients: critic-gradient updates can be taken at the unconditional mean, at the LRT-gated mean, or a blend, exposing a continuum from exploitation to conservatism. We standardize states and actions consistently at train and test time and report a state-conditional out-of-distribution (OOD) metric alongside return. On D4RL MuJoCo tasks, LRT-Diffusion improves the return-OOD trade-off over strong Q-guided baselines in our implementation while honoring the desired alpha. Theoretically, we establish level-alpha calibration, concise stability bounds, and a return comparison showing when LRT surpasses Q-guidance-especially when off-support errors dominate. Overall, LRT-Diffusion is a drop-in, inference-time method that adds principled, calibrated risk control to diffusion policies for offline RL.

View on arXiv PDF

Similar