ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC

Yurui Du, Pinhao Song, Yutong Hu, Renaud Detry

arXiv:2605.0470935.9h-index: 24

AI Analysis

For researchers in model-based RL and visual control, ELVIS addresses the bottleneck of long-horizon planning under compounding errors and multi-modal futures, offering a practical solution with strong empirical results.

ELVIS introduces a latent model predictive controller that uses Gaussian-mixture MPPI and uncertainty-aware lambda-return to enable long-horizon visual MPC, achieving state-of-the-art results on 14 DeepMind Control Suite tasks and zero-shot transfer to a real-world occluded sand-spraying task with improved surface-quality metrics.

A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding model errors amplified by visual occlusions make deep imagination brittle. We present ELVIS, a latent model predictive controller (MPC) designed to make long-horizon planning practical. ELVIS plans in a Dreamer-style recurrent state space model (RSSM) and replaces standard unimodal model predictive path integral (MPPI) with a Gaussian-mixture MPPI that maintains multiple coherent hypotheses over long horizons, avoiding mode averaging under branching rollouts. In parallel, ELVIS stabilizes deep imagination with a shared uncertainty-aware lambda-return: an ensemble of latent critics defines an upper-confidence-bound (UCB) score that gates a time-varying lambda, adaptively trading off bootstrapping versus look-ahead to limit compounding error during planning. The same return is used both to train an actor-critic prior from imagined rollouts and to score candidate trajectories inside GMM-MPPI, aligning RL objectives with the planner's long-horizon optimization. On fourteen DeepMind Control Suite visual tasks, ELVIS establishes state-of-the-art performance compared with TD-MPC2 and DreamerV3. Finally, ELVIS transfers zero-shot to a real-world sand-spraying task with severe occlusions, improving surface-quality metrics and demonstrating robustness beyond simulation.

View on arXiv PDF

Similar