Non-Stationary Representation Learning in Sequential Linear Bandits
This addresses the challenge of efficient decision-making in dynamic environments for reinforcement learning and bandit algorithms, representing an incremental improvement by adapting representations to non-stationarity.
The paper tackles the problem of representation learning for multi-task decision-making in non-stationary environments using sequential linear bandits, proposing an online algorithm that learns and transfers non-stationary representations adaptively, with results showing it significantly outperforms existing independent-task methods.
In this paper, we study representation learning for multi-task decision-making in non-stationary environments. We consider the framework of sequential linear bandits, where the agent performs a series of tasks drawn from distinct sets associated with different environments. The embeddings of tasks in each set share a low-dimensional feature extractor called representation, and representations are different across sets. We propose an online algorithm that facilitates efficient decision-making by learning and transferring non-stationary representations in an adaptive fashion. We prove that our algorithm significantly outperforms the existing ones that treat tasks independently. We also conduct experiments using both synthetic and real data to validate our theoretical insights and demonstrate the efficacy of our algorithm.