On High-dimensional and Low-rank Tensor Bandits
This work addresses the limitation of linear bandits in modeling high-dimensional structured systems, such as recommender systems, by introducing a tensor-based approach with low-rank assumptions.
The paper tackles the problem of high-dimensional tensor bandits by proposing a novel algorithm, TOFU, which leverages low-rank tensor structure to reduce regret; theoretical analysis shows it improves the best-known regret upper bound by an exponential factor in system order.
Most existing studies on linear bandits focus on the one-dimensional characterization of the overall system. While being representative, this formulation may fail to model applications with high-dimensional but favorable structures, such as the low-rank tensor representation for recommender systems. To address this limitation, this work studies a general tensor bandits model, where actions and system parameters are represented by tensors as opposed to vectors, and we particularly focus on the case that the unknown system tensor is low-rank. A novel bandit algorithm, coined TOFU (Tensor Optimism in the Face of Uncertainty), is developed. TOFU first leverages flexible tensor regression techniques to estimate low-dimensional subspaces associated with the system tensor. These estimates are then utilized to convert the original problem to a new one with norm constraints on its system parameters. Lastly, a norm-constrained bandit subroutine is adopted by TOFU, which utilizes these constraints to avoid exploring the entire high-dimensional parameter space. Theoretical analyses show that TOFU improves the best-known regret upper bound by a multiplicative factor that grows exponentially in the system order. A novel performance lower bound is also established, which further corroborates the efficiency of TOFU.