LGJan 15

DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction

arXiv:2601.10471v22 citationsh-index: 6

Originality Highly original

AI Analysis

This addresses the problem of computational inefficiency in offline RL for researchers and practitioners, offering a more stable and expressive approach.

The paper tackles the computational challenge of optimizing generative policies in offline RL by proposing DeFlow, a framework that decouples manifold modeling from value maximization, achieving superior performance on the OGBench benchmark.

We present DeFlow, a decoupled offline RL framework that leverages flow matching to faithfully capture complex behavior manifolds. Optimizing generative policies is computationally prohibitive, typically necessitating backpropagation through ODE solvers. We address this by learning a lightweight refinement module within an explicit, data-derived trust region of the flow manifold, rather than sacrificing the iterative generation capability via single-step distillation. This way, we bypass solver differentiation and eliminate the need for balancing loss terms, ensuring stable improvement while fully preserving the flow's iterative expressivity. Empirically, DeFlow achieves superior performance on the challenging OGBench benchmark and demonstrates efficient offline-to-online adaptation.

View on arXiv PDF

Similar