SY LGApr 2, 2025

Learning with Imperfect Models: When Multi-step Prediction Mitigates Compounding Error

Anne Somalwar, Bruce D. Lee, George J. Pappas, Nikolai Matni

arXiv:2504.01766v15.96 citationsh-index: 8CDC

Originality Incremental advance

AI Analysis

This work addresses a key challenge in model-based reinforcement learning and imitation learning, offering theoretical insights for practitioners, though it is incremental as it builds on existing approaches.

The paper tackles the problem of compounding error in learning-based control by analyzing when multi-step prediction outperforms single-step models, showing that multi-step predictors reduce bias in misspecified models due to partial observability, while single-step models are better in well-specified cases.

Compounding error, where small prediction mistakes accumulate over time, presents a major challenge in learning-based control. For example, this issue often limits the performance of model-based reinforcement learning and imitation learning. One common approach to mitigate compounding error is to train multi-step predictors directly, rather than relying on autoregressive rollout of a single-step model. However, it is not well understood when the benefits of multi-step prediction outweigh the added complexity of learning a more complicated model. In this work, we provide a rigorous analysis of this trade-off in the context of linear dynamical systems. We show that when the model class is well-specified and accurately captures the system dynamics, single-step models achieve lower asymptotic prediction error. On the other hand, when the model class is misspecified due to partial observability, direct multi-step predictors can significantly reduce bias and thus outperform single-step approaches. These theoretical results are supported by numerical experiments, wherein we also (a) empirically evaluate an intermediate strategy which trains a single-step model using a multi-step loss and (b) evaluate performance of single step and multi-step predictors in a closed loop control setting.

View on arXiv PDF

Similar