CV LGJun 1, 2023

Universal Test-time Adaptation through Weight Ensembling, Diversity Weighting, and Prior Correction

Robert A. Marsden, Mario Döbler, Bin Yang

arXiv:2306.00650v223.877 citationsh-index: 8Has Code

Originality Highly original

AI Analysis

This work addresses the practical need for robust online adaptation in machine learning models under diverse environmental conditions, representing a novel and comprehensive approach in the field.

The paper tackles the problem of universal test-time adaptation (TTA) to handle various distribution shifts during deployment, proposing a method that addresses challenges like model bias, loss of generalization, and class prior shifts, and reports setting new standards in performance across multiple settings and datasets.

Since distribution shifts are likely to occur during test-time and can drastically decrease the model's performance, online test-time adaptation (TTA) continues to update the model after deployment, leveraging the current test data. Clearly, a method proposed for online TTA has to perform well for all kinds of environmental conditions. By introducing the variable factors domain non-stationarity and temporal correlation, we first unfold all practically relevant settings and define the entity as universal TTA. We want to highlight that this is the first work that covers such a broad spectrum, which is indispensable for the use in practice. To tackle the problem of universal TTA, we identify and highlight several challenges a self-training based method has to deal with: 1) model bias and the occurrence of trivial solutions when performing entropy minimization on varying sequence lengths with and without multiple domain shifts, 2) loss of generalization which exacerbates the adaptation to multiple domain shifts and the occurrence of catastrophic forgetting, and 3) performance degradation due to shifts in class prior. To prevent the model from becoming biased, we leverage a dataset and model-agnostic certainty and diversity weighting. In order to maintain generalization and prevent catastrophic forgetting, we propose to continually weight-average the source and adapted model. To compensate for disparities in the class prior during test-time, we propose an adaptive prior correction scheme that reweights the model's predictions. We evaluate our approach, named ROID, on a wide range of settings, datasets, and models, setting new standards in the field of universal TTA. Code is available at: https://github.com/mariodoebler/test-time-adaptation

View on arXiv PDF Code

Similar