The Cost of Learning under Multiple Change Points
This addresses a critical problem for online learning systems in dynamic environments with multiple shifts, offering a foundational algorithmic improvement over incremental approaches.
The paper tackles online learning in environments with multiple change points, showing that classical methods can fail catastrophically due to endogenous confounding, and proposes Anytime Tracking CUSUM (ATC) algorithms that achieve nearly minimax-optimal regret, matching a novel information-theoretic lower bound.
We consider an online learning problem in environments with multiple change points. In contrast to the single change point problem that is widely studied using classical "high confidence" detection schemes, the multiple change point environment presents new learning-theoretic and algorithmic challenges. Specifically, we show that classical methods may exhibit catastrophic failure (high regret) due to a phenomenon we refer to as endogenous confounding. To overcome this, we propose a new class of learning algorithms dubbed Anytime Tracking CUSUM (ATC). These are horizon-free online algorithms that implement a selective detection principle, balancing the need to ignore "small" (hard-to-detect) shifts, while reacting "quickly" to significant ones. We prove that the performance of a properly tuned ATC algorithm is nearly minimax-optimal; its regret is guaranteed to closely match a novel information-theoretic lower bound on the achievable performance of any learning algorithm in the multiple change point problem. Experiments on synthetic as well as real-world data validate the aforementioned theoretical findings.