Simple Projection-Free Algorithm for Contextual Recommendation with Logarithmic Regret and Robustness
This work provides a more efficient and robust solution for contextual recommendation systems, which is incremental but addresses a key computational bottleneck in prior methods.
The paper tackles the computational inefficiency of existing contextual recommendation algorithms by introducing a simpler method that eliminates the costly Mahalanobis projection step, achieving the same logarithmic regret bound of O(d log T) and adding robustness to suboptimal action feedback.
Contextual recommendation is a variant of contextual linear bandits in which the learner observes an (optimal) action rather than a reward scalar. Recently, Sakaue et al. (2025) developed an efficient Online Newton Step (ONS) approach with an $O(d\log T)$ regret bound, where $d$ is the dimension of the action space and $T$ is the time horizon. In this paper, we present a simple algorithm that is more efficient than the ONS-based method while achieving the same regret guarantee. Our core idea is to exploit the improperness inherent in contextual recommendation, leading to an update rule akin to the second-order perceptron from online classification. This removes the Mahalanobis projection step required by ONS, which is often a major computational bottleneck. More importantly, the same algorithm remains robust to possibly suboptimal action feedback, whereas the prior ONS-based method required running multiple ONS learners with different learning rates for this extension. We describe how our method works in general Hilbert spaces (e.g., via kernelization), where eliminating Mahalanobis projections becomes even more beneficial.