CONTRAST: Continual Multi-source Adaptation to Dynamic Distributions
This addresses a practical problem in machine learning for scenarios with evolving data streams, though it appears incremental as it builds on ensemble methods with specific adaptations.
The paper tackles the challenge of adapting to dynamic data distributions with small test batches and no source data access, proposing CONTRAST which optimally combines multiple source models and selectively updates parameters to prevent forgetting, achieving performance at least as good as the best source model and maintaining robustness over time.
Adapting to dynamic data distributions is a practical yet challenging task. One effective strategy is to use a model ensemble, which leverages the diverse expertise of different models to transfer knowledge to evolving data distributions. However, this approach faces difficulties when the dynamic test distribution is available only in small batches and without access to the original source data. To address the challenge of adapting to dynamic distributions in such practical settings, we propose Continual Multi-source Adaptation to Dynamic Distributions (CONTRAST), a novel method that optimally combines multiple source models to adapt to the dynamic test data. CONTRAST has two distinguishing features. First, it efficiently computes the optimal combination weights to combine the source models to adapt to the test data distribution continuously as a function of time. Second, it identifies which of the source model parameters to update so that only the model which is most correlated to the target data is adapted, leaving the less correlated ones untouched; this mitigates the issue of ``forgetting" the source model parameters by focusing only on the source model that exhibits the strongest correlation with the test batch distribution. Through theoretical analysis we show that the proposed method is able to optimally combine the source models and prioritize updates to the model least prone to forgetting. Experimental analysis on diverse datasets demonstrates that the combination of multiple source models does at least as well as the best source (with hindsight knowledge), and performance does not degrade as the test data distribution changes over time (robust to forgetting).