CVNov 23, 2020

Better Aggregation in Test-Time Augmentation

Divya Shanmugam, Davis Blalock, Guha Balakrishnan, John Guttag

arXiv:2011.11156v227.2219 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of suboptimal prediction aggregation in test-time augmentation for practitioners in image classification, offering an incremental improvement to a widely used technique.

This paper investigates the suboptimality of simple averaging in test-time augmentation (TTA) for image classification, revealing that TTA can convert correct predictions to incorrect ones despite overall accuracy gains. The authors propose a learning-based aggregation method that consistently outperforms existing TTA approaches across various models, datasets, and augmentations.

Test-time augmentation -- the aggregation of predictions across transformed versions of a test input -- is a common practice in image classification. Traditionally, predictions are combined using a simple average. In this paper, we present 1) experimental analyses that shed light on cases in which the simple average is suboptimal and 2) a method to address these shortcomings. A key finding is that even when test-time augmentation produces a net improvement in accuracy, it can change many correct predictions into incorrect predictions. We delve into when and why test-time augmentation changes a prediction from being correct to incorrect and vice versa. Building on these insights, we present a learning-based method for aggregating test-time augmentations. Experiments across a diverse set of models, datasets, and augmentations show that our method delivers consistent improvements over existing approaches.

View on arXiv PDF

Similar