A Set of Rules for Model Validation
This work addresses the need for standardized validation practices in machine learning, but it is incremental as it builds on existing validation concepts without introducing new methods.
The paper tackles the problem of assessing data-driven models' generalization to unseen data by proposing a set of general rules for model validation, aimed at helping practitioners create reliable validation plans and report results transparently.
The validation of a data-driven model is the process of assessing the model's ability to generalize to new, unseen data in the population of interest. This paper proposes a set of general rules for model validation. These rules are designed to help practitioners create reliable validation plans and report their results transparently. While no validation scheme is flawless, these rules can help practitioners ensure their strategy is sufficient for practical use, openly discuss any limitations of their validation strategy, and report clear, comparable performance metrics.