CVNov 16, 2025

Open-World Test-Time Adaptation with Hierarchical Feature Aggregation and Attention Affine

Ziqiong Liu, Yushun Tang, Junyang Ji, Zhihai He

arXiv:2511.12607v13.6

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific problem for machine learning practitioners dealing with out-of-distribution samples during test-time adaptation, representing an incremental improvement over existing methods.

The paper tackles the problem of test-time adaptation in open-world scenarios where models encounter unseen categories, which can degrade accuracy and adaptation. The proposed method significantly improves performance on benchmark classification datasets.

Test-time adaptation (TTA) refers to adjusting the model during the testing phase to cope with changes in sample distribution and enhance the model's adaptability to new environments. In real-world scenarios, models often encounter samples from unseen (out-of-distribution, OOD) categories. Misclassifying these as known (in-distribution, ID) classes not only degrades predictive accuracy but can also impair the adaptation process, leading to further errors on subsequent ID samples. Many existing TTA methods suffer substantial performance drops under such conditions. To address this challenge, we propose a Hierarchical Ladder Network that extracts OOD features from class tokens aggregated across all Transformer layers. OOD detection performance is enhanced by combining the original model prediction with the output of the Hierarchical Ladder Network (HLN) via weighted probability fusion. To improve robustness under domain shift, we further introduce an Attention Affine Network (AAN) that adaptively refines the self-attention mechanism conditioned on the token information to better adapt to domain drift, thereby improving the classification performance of the model on datasets with domain shift. Additionally, a weighted entropy mechanism is employed to dynamically suppress the influence of low-confidence samples during adaptation. Experimental results on benchmark datasets show that our method significantly improves the performance on the most widely used classification datasets.

View on arXiv PDF

Similar