Model Predictive Path Integral Control as Preconditioned Gradient Descent
This work provides theoretical insights into a popular trajectory optimization method, which is incremental as it clarifies existing techniques rather than introducing new ones.
The paper tackled the problem of understanding the optimization structure of Model Predictive Path Integral (MPPI) control by developing a variational interpretation that frames it as preconditioned gradient descent, showing that classical MPPI is recovered exactly with unit step size and providing convergence analysis with explicit bounds.
Model Predictive Path Integral (MPPI) control is a popular sampling-based method for trajectory optimization in nonlinear and nonconvex settings, yet its optimization structure remains only partially understood. We develop a variational, optimization-theoretic interpretation of MPPI by lifting constrained trajectory optimization to a KL-regularized problem over distributions and reducing it to a negative log-partition (free-energy) objective over a tractable sampling family. For a general parametric family, this yields a preconditioned gradient method on the distribution parameters and a natural multi-step extension of MPPI. For the fixed-covariance Gaussian family, we show that classical MPPI is recovered exactly as a preconditioned gradient descent step with unit step size. This interpretation enables a direct convergence analysis: under bounded feasible sets, we derive an explicit upper bound on the smoothness constant and a simple sufficient condition guaranteeing descent of exact MPPI. Numerical experiments support the theory and illustrate the effect of key hyperparameters on performance.