Non-Stochastic Control with Bandit Feedback
This addresses a challenging control problem for robotics or autonomous systems where feedback is limited, though it appears incremental as it builds on bandit optimization methods.
The paper tackles the problem of controlling a linear dynamical system with adversarial perturbations using only scalar bandit feedback and an unknown loss function, achieving an efficient sublinear regret algorithm for both known and unknown systems.
We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a known or unknown system, we give an efficient sublinear regret algorithm. The main algorithmic difficulty is the dependence of the loss on past controls. To overcome this issue, we propose an efficient algorithm for the general setting of bandit convex optimization for loss functions with memory, which may be of independent interest.