SEAISep 11, 2023

Hazards in Deep Learning Testing: Prevalence, Impact and Recommendations

arXiv:2309.05381v11 citationsh-index: 44
Originality Synthesis-oriented
AI Analysis

This work addresses methodological pitfalls in empirical research for the software engineering community, aiming to improve reliability in deep learning testing.

The paper identifies 10 common hazards in deep learning testing experiments that can lead to invalid conclusions, such as Type I errors, and demonstrates their criticality through a sensitivity analysis of 30 influential studies, showing all hazards have the potential to invalidate findings.

Much research on Machine Learning testing relies on empirical studies that evaluate and show their potential. However, in this context empirical results are sensitive to a number of parameters that can adversely impact the results of the experiments and potentially lead to wrong conclusions (Type I errors, i.e., incorrectly rejecting the Null Hypothesis). To this end, we survey the related literature and identify 10 commonly adopted empirical evaluation hazards that may significantly impact experimental results. We then perform a sensitivity analysis on 30 influential studies that were published in top-tier SE venues, against our hazard set and demonstrate their criticality. Our findings indicate that all 10 hazards we identify have the potential to invalidate experimental findings, such as those made by the related literature, and should be handled properly. Going a step further, we propose a point set of 10 good empirical practices that has the potential to mitigate the impact of the hazards. We believe our work forms the first step towards raising awareness of the common pitfalls and good practices within the software engineering community and hopefully contribute towards setting particular expectations for empirical research in the field of deep learning testing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes