Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation
This work addresses the inference speed bottleneck in diffusion-based decision-making for offline RL, which is important for practical deployment, though it appears to be an incremental improvement on existing consistency distillation techniques.
The paper tackles the slow inference speed of diffusion models in offline reinforcement learning by proposing a reward-aware consistency trajectory distillation method that enables single-step generation. The approach achieves an 8.7% performance improvement over previous state-of-the-art methods and offers up to 142x speedup in inference time.
Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While the consistency model offers a potential solution, its applications to decision-making often struggle with suboptimal demonstrations or rely on complex concurrent training of multiple networks. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method enables single-step generation while maintaining higher performance and simpler training. Empirical evaluations on the Gym MuJoCo benchmarks and long horizon planning demonstrate that our approach can achieve an 8.7% improvement over previous state-of-the-art while offering up to 142x speedup over diffusion counterparts in inference time.