Online Decision Making with History-Average Dependent Costs (Extended)
This work addresses a specific scenario in online learning with history-dependent costs, which is incremental as it builds on existing decision-making frameworks.
The paper tackles the problem of online sequential decision-making where costs depend on the time average of past decisions, by proposing the FTARL algorithm with adaptive regularizers to enforce constraints and achieve tight regret bounds.
In many online sequential decision-making scenarios, a learner's choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.