LG AI MLSep 4, 2024

Oops, I Sampled it Again: Reinterpreting Confidence Intervals in Few-Shot Learning

Raphael Lafargue, Luke Smith, Franck Vermet, Mathias Löwe, Ian Reid, Vincent Gripon, Jack Valmadre

arXiv:2409.02850v22.6h-index: 13Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses a methodological flaw in few-shot learning evaluations, which is incremental but important for researchers in the field.

The paper identifies that confidence intervals in few-shot learning are misleading because they account for sampling randomness but not data variability, showing a notable underestimation when computed with replacement. It proposes paired tests to partially address this issue and introduces an optimized benchmark.

The predominant method for computing confidence intervals (CI) in few-shot learning (FSL) is based on sampling the tasks with replacement, i.e.\ allowing the same samples to appear in multiple tasks. This makes the CI misleading in that it takes into account the randomness of the sampler but not the data itself. To quantify the extent of this problem, we conduct a comparative analysis between CIs computed with and without replacement. These reveal a notable underestimation by the predominant method. This observation calls for a reevaluation of how we interpret confidence intervals and the resulting conclusions in FSL comparative studies. Our research demonstrates that the use of paired tests can partially address this issue. Additionally, we explore methods to further reduce the (size of the) CI by strategically sampling tasks of a specific size. We also introduce a new optimized benchmark, which can be accessed at https://github.com/RafLaf/FSL-benchmark-again

View on arXiv PDF Code

Similar