Online non-convex optimization with imperfect feedback
This work addresses the problem of optimizing non-convex functions in online settings with limited feedback, which is incremental as it builds on existing regret minimization methods.
The paper tackles online learning with non-convex losses under imperfect feedback, proposing a dual averaging-based policy to achieve tight regret minimization guarantees for both static and dynamic regret. It applies this framework to cases where only realized losses are available, using a kernel-based estimator to generate inexact models.
We consider the problem of online learning with non-convex losses. In terms of feedback, we assume that the learner observes - or otherwise constructs - an inexact model for the loss function encountered at each stage, and we propose a mixed-strategy learning policy based on dual averaging. In this general context, we derive a series of tight regret minimization guarantees, both for the learner's static (external) regret, as well as the regret incurred against the best dynamic policy in hindsight. Subsequently, we apply this general template to the case where the learner only has access to the actual loss incurred at each stage of the process. This is achieved by means of a kernel-based estimator which generates an inexact model for each round's loss function using only the learner's realized losses as input.