deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks
This addresses the issue of unreliable empirical progress in ML/DL research for researchers, though it is incremental as it packages existing methods.
The authors tackled the problem of insufficient statistical significance testing in machine learning research by developing an easy-to-use package with tailored tests and utilities, aiming to prevent misleading improvements and resource waste.
A lot of Machine Learning (ML) and Deep Learning (DL) research is of an empirical nature. Nevertheless, statistical significance testing (SST) is still not widely used. This endangers true progress, as seeming improvements over a baseline might be statistical flukes, leading follow-up research astray while wasting human and computational resources. Here, we provide an easy-to-use package containing different significance tests and utility functions specifically tailored towards research needs and usability.