CVNov 23, 2020

Better Aggregation in Test-Time Augmentation

arXiv:2011.11156v2209 citations
AI Analysis

This work addresses the problem of suboptimal prediction aggregation in test-time augmentation for practitioners in image classification, offering an incremental improvement to a widely used technique.

This paper investigates the suboptimality of simple averaging in test-time augmentation (TTA) for image classification, revealing that TTA can convert correct predictions to incorrect ones despite overall accuracy gains. The authors propose a learning-based aggregation method that consistently outperforms existing TTA approaches across various models, datasets, and augmentations.

Test-time augmentation -- the aggregation of predictions across transformed versions of a test input -- is a common practice in image classification. Traditionally, predictions are combined using a simple average. In this paper, we present 1) experimental analyses that shed light on cases in which the simple average is suboptimal and 2) a method to address these shortcomings. A key finding is that even when test-time augmentation produces a net improvement in accuracy, it can change many correct predictions into incorrect predictions. We delve into when and why test-time augmentation changes a prediction from being correct to incorrect and vice versa. Building on these insights, we present a learning-based method for aggregating test-time augmentations. Experiments across a diverse set of models, datasets, and augmentations show that our method delivers consistent improvements over existing approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes