LGMay 26
Agile Online Model Selection: Resolving Adaptation Lag via Safeguarded Large Learning RatesKei Takemura, Ryuta Matsuno, Keita Sakuma
Maintaining predictive accuracy in non-stationary environments requires online model selection to adapt autonomously to unknown distribution shifts. However, existing tuning-free algorithms face a fundamental trade-off between robustness and agility. Specifically, to ensure dynamic regret bounds, they must restrict learning rates to small constants (e.g., $O(1)$). This restriction inevitably causes significant adaptation lag during abrupt changes. To resolve this, we propose a novel optimistic online mirror descent that utilizes safeguarded large learning rates up to $Θ(T)$, where $T$ is the number of rounds. Our key technical contribution is a post-hoc penalty mechanism that dynamically monitors unstable updates and excludes learning rates incurring excessive regret, eliminating the need for restrictive a priori constraints. We show that the cumulative penalty remains $O(\log T)$, allowing our algorithm to match near-optimal worst-case guarantees while achieving superior rates in benign cases. Empirical evaluations on synthetic and eleven diverse real-world datasets demonstrate that our approach reduces the adaptation lag from hundreds of rounds to a few rounds, consistently outperforming tuning-free baselines.
LGAug 5, 2024
Backward Compatibility in Attributive Explanation and Enhanced Model Training MethodRyuta Matsuno
Model update is a crucial process in the operation of ML/AI systems. While updating a model generally enhances the average prediction performance, it also significantly impacts the explanations of predictions. In real-world applications, even minor changes in explanations can have detrimental consequences. To tackle this issue, this paper introduces BCX, a quantitative metric that evaluates the backward compatibility of feature attribution explanations between pre- and post-update models. BCX utilizes practical agreement metrics to calculate the average agreement between the explanations of pre- and post-update models, specifically among samples on which both models accurately predict. In addition, we propose BCXR, a BCX-aware model training method by designing surrogate losses which theoretically lower bounds agreement scores. Furthermore, we present a universal variant of BCXR that improves all agreement metrics, utilizing L2 distance among the explanations of the models. To validate our approach, we conducted experiments on eight real-world datasets, demonstrating that BCXR achieves superior trade-offs between predictive performances and BCX scores, showcasing the effectiveness of our BCXR methods.
LGAug 14, 2025
Source Component Shift Adaptation via Offline Decomposition and Online Mixing ApproachRyuta Matsuno
This paper addresses source component shift adaptation, aiming to update predictions adapting to source component shifts for incoming data streams based on past training data. Existing online learning methods often fail to utilize recurring shifts effectively, while model-pool-based methods struggle to capture individual source components, leading to poor adaptation. In this paper, we propose a source component shift adaptation method via an offline decomposition and online mixing approach. We theoretically identify that the problem can be divided into two subproblems: offline source component decomposition and online mixing weight adaptation. Based on this, our method first determines prediction models, each of which learns a source component solely based on past training data offline through the EM algorithm. Then, it updates the mixing weight of the prediction models for precise prediction through online convex optimization. Thanks to our theoretical derivation, our method fully leverages the characteristics of the shifts, achieving superior adaptation performance over existing methods. Experiments conducted on various real-world regression datasets demonstrate that our method outperforms baselines, reducing the cumulative test loss by up to 67.4%.
LGMay 27, 2025
Improved Impossible Tuning and Lipschitz-Adaptive Universal Online Learning with Gradient VariationsKei Takemura, Ryuta Matsuno, Keita Sakuma
A central goal in online learning is to achieve adaptivity to unknown problem characteristics, such as environmental changes captured by gradient variation (GV), function curvature (universal online learning, UOL), and gradient scales (Lipschitz adaptivity, LA). Simultaneously achieving these with optimal performance is a major challenge, partly due to limitations in algorithms for prediction with expert advice. These algorithms often serve as meta-algorithms in online ensemble frameworks, and their sub-optimality hinders overall UOL performance. Specifically, existing algorithms addressing the ``impossible tuning'' issue incur an excess $\sqrt{\log T}$ factor in their regret bound compared to the lower bound. To solve this problem, we propose a novel optimistic online mirror descent algorithm with an auxiliary initial round using large learning rates. This design enables a refined analysis where a generated negative term cancels the gap-related factor, resolving the impossible tuning issue up to $\log\log T$ factors. Leveraging our improved algorithm as a meta-algorithm, we develop the first UOL algorithm that simultaneously achieves state-of-the-art GV bounds and LA under standard assumptions. Our UOL result overcomes key limitations of prior works, notably resolving the conflict between LA mechanisms and regret analysis for GV bounds -- an open problem highlighted by Xie et al.