Rethinking Evaluation of Infrared Small Target Detection
This work addresses evaluation challenges for researchers and practitioners in IRSTD, though it is incremental as it builds on existing metrics and methods.
The paper tackles the limitations in current evaluation protocols for infrared small target detection (IRSTD) by introducing a hybrid-level metric, systematic error analysis, and cross-dataset evaluation, aiming to provide a more comprehensive framework for assessing model capabilities and robustness.
As an essential vision task, infrared small target detection (IRSTD) has seen significant advancements through deep learning. However, critical limitations in current evaluation protocols impede further progress. First, existing methods rely on fragmented pixel- and target-level specific metrics, which fails to provide a comprehensive view of model capabilities. Second, an excessive emphasis on overall performance scores obscures crucial error analysis, which is vital for identifying failure modes and improving real-world system performance. Third, the field predominantly adopts dataset-specific training-testing paradigms, hindering the understanding of model robustness and generalization across diverse infrared scenarios. This paper addresses these issues by introducing a hybrid-level metric incorporating pixel- and target-level performance, proposing a systematic error analysis method, and emphasizing the importance of cross-dataset evaluation. These aim to offer a more thorough and rational hierarchical analysis framework, ultimately fostering the development of more effective and robust IRSTD models. An open-source toolkit has be released to facilitate standardized benchmarking.