CVJul 2, 2025

Learning from Random Subspace Exploration: Generalized Test-Time Augmentation with Self-supervised Distillation

arXiv:2507.01347v1h-index: 20
Originality Incremental advance
AI Analysis

This provides a general, off-the-shelf solution for enhancing model robustness and efficiency in various applications, though it is incremental relative to existing test-time augmentation methods.

The paper tackles the problem of improving trained model performance by introducing Generalized Test-Time Augmentation (GTTA), a method that uses random PCA subspace perturbations and self-supervised distillation to reduce errors and computational cost, achieving validated generality across vision and non-vision tasks with no loss in accuracy.

We introduce Generalized Test-Time Augmentation (GTTA), a highly effective method for improving the performance of a trained model, which unlike other existing Test-Time Augmentation approaches from the literature is general enough to be used off-the-shelf for many vision and non-vision tasks, such as classification, regression, image segmentation and object detection. By applying a new general data transformation, that randomly perturbs multiple times the PCA subspace projection of a test input, GTTA forms robust ensembles at test time in which, due to sound statistical properties, the structural and systematic noises in the initial input data is filtered out and final estimator errors are reduced. Different from other existing methods, we also propose a final self-supervised learning stage in which the ensemble output, acting as an unsupervised teacher, is used to train the initial single student model, thus reducing significantly the test time computational cost, at no loss in accuracy. Our tests and comparisons to strong TTA approaches and SoTA models on various vision and non-vision well-known datasets and tasks, such as image classification and segmentation, speech recognition and house price prediction, validate the generality of the proposed GTTA. Furthermore, we also prove its effectiveness on the more specific real-world task of salmon segmentation and detection in low-visibility underwater videos, for which we introduce DeepSalmon, the largest dataset of its kind in the literature.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes