LG SY OCNov 12, 2025

Quasi-Newton Compatible Actor-Critic for Deterministic Policies

Arash Bahari Kordabad, Dean Brandner, Sebastien Gros, Sergio Lucia, Sadegh Soudjani

arXiv:2511.09509v14.11 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses convergence speed issues in reinforcement learning for practitioners using deterministic policies, though it is incremental as it builds on existing actor-critic methods.

The paper tackled the problem of slow convergence in deterministic policy gradient methods by proposing a second-order actor-critic framework that uses curvature information, resulting in faster convergence and improved performance over standard baselines.

In this paper, we propose a second-order deterministic actor-critic framework in reinforcement learning that extends the classical deterministic policy gradient method to exploit curvature information of the performance function. Building on the concept of compatible function approximation for the critic, we introduce a quadratic critic that simultaneously preserves the true policy gradient and an approximation of the performance Hessian. A least-squares temporal difference learning scheme is then developed to estimate the quadratic critic parameters efficiently. This construction enables a quasi-Newton actor update using information learned by the critic, yielding faster convergence compared to first-order methods. The proposed approach is general and applicable to any differentiable policy class. Numerical examples demonstrate that the method achieves improved convergence and performance over standard deterministic actor-critic baselines.

View on arXiv PDF

Similar