ML LGNov 11, 2025

Optimal control of the future via prospective learning with control

Yuxin Bai, Aranyak Acharyya, Ashwin De Silva, Zeyu Shen, James Hassett, Joshua T. Vogelstein

arXiv:2511.08717v24.51 citationsh-index: 49

Originality Incremental advance

AI Analysis

This addresses the challenge of applying AI to realistic, non-stationary control problems, offering a novel approach that could enhance mobile agents and robotics, though it appears incremental as it builds on supervised learning rather than introducing a completely new paradigm.

The paper tackles the problem of optimal control in non-stationary, reset-free environments by extending supervised learning to create a framework called Prospective Learning with Control (PL+C), proving that empirical risk minimization asymptotically achieves the Bayes optimal policy under general assumptions, and demonstrating that their agents are orders of magnitude more efficient than modern RL algorithms in tasks like foraging.

Optimal control of the future is the next frontier for AI. Current approaches to this problem are typically rooted in either reinforcement learning (RL). While powerful, this learning framework is mathematically distinct from supervised learning, which has been the main workhorse for the recent achievements in AI. Moreover, RL typically operates in a stationary environment with episodic resets, limiting its utility to more realistic settings. Here, we extend supervised learning to address learning to control in non-stationary, reset-free environments. Using this framework, called ''Prospective Learning with Control (PL+C)'', we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy. We then consider a specific instance of prospective learning with control, foraging -- which is a canonical task for any mobile agent -- be it natural or artificial. We illustrate that modern RL algorithms fail to learn in these non-stationary reset-free environments, and even with modifications, they are orders of magnitude less efficient than our prospective foraging agents.

View on arXiv PDF

Similar