Transformers As Generalizable Optimal Controllers
This addresses the challenge of generalizable optimal control for heterogeneous systems, offering a practical approximator for near-optimal feedback laws, though it is incremental as it builds on existing transformer and LQR methods.
The paper tackles the problem of learning a single controller for multiple MIMO LTI systems using a transformer policy trained on LQR trajectories, achieving empirically small sub-optimality relative to LQR and stabilizing performance under perturbations.
We study whether optimal state-feedback laws for a family of heterogeneous Multiple-Input, Multiple-Output (MIMO) Linear Time-Invariant (LTI) systems can be captured by a single learned controller. We train one transformer policy on LQR-generated trajectories from systems with different state and input dimensions, using a shared representation with standardization, padding, dimension encoding, and masked loss. The policy maps recent state history to control actions without requiring plant matrices at inference time. Across a broad set of systems, it achieves empirically small sub-optimality relative to Linear Quadratic Regulator (LQR), remains stabilizing under moderate parameter perturbations, and benefits from lightweight fine-tuning on unseen systems. These results support transformer policies as practical approximators of near-optimal feedback laws over structured linear-system families.