Online Learning for Supervisory Switching Control

arXiv:2603.1476213.6h-index: 67

Predicted impact top 18% in OC · last 90 daysOriginality Highly original

AI Analysis

This addresses a gap in control theory for practitioners by providing non-asymptotic performance bounds in scenarios where existing methods fail due to restrictive assumptions like system stability.

The paper tackles the problem of supervisory switching control for partially-observed linear dynamical systems by proposing a data-driven algorithm that identifies the best controller among candidates, including potentially destabilizing ones, with finite-time guarantees. It achieves this in O(N log N) steps while ensuring finite L2-gain against disturbances.

We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy the best controller for the unknown system by periodically selecting among a collection of $N$ candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds. Conversely, current non-asymptotic methods in both online learning and system identification require restrictive assumptions that are incompatible in a control setting, such as system stability, which preclude testing potentially unstable controllers. To bridge this gap, we propose a novel, non-asymptotic analysis of supervisory control that adapts multi-armed bandit algorithms to address these control-theoretic challenges. Our data-driven algorithm evaluates candidate controllers via scoring criteria that leverage system observability to isolate the effects of historical states, enabling both detection of destabilizing controllers and accurate system identification. We present two algorithmic variants with dimension-free, finite-time guarantees, where each identifies the most suitable controller in $\mathcal{O}(N \log N)$ steps, while simultaneously achieving finite $L_2$-gain with respect to system disturbances.

View on arXiv PDF

Similar