LGAIMar 1, 2025

A Guide to Failure in Machine Learning: Reliability and Robustness from Foundations to Practice

arXiv:2503.00563v1h-index: 7
Originality Synthesis-oriented
AI Analysis

This work addresses the barrier to ML adoption for practitioners by offering a structured approach to reason about model failures, though it is incremental as it synthesizes existing concepts rather than introducing new methods.

The paper tackles the problem of unexpected failures in machine learning models by differentiating between lack of reliability and lack of robustness, providing a guide with theoretical concepts, techniques, and real-world examples to help practitioners understand and address these issues.

One of the main barriers to adoption of Machine Learning (ML) is that ML models can fail unexpectedly. In this work, we aim to provide practitioners a guide to better understand why ML models fail and equip them with techniques they can use to reason about failure. Specifically, we discuss failure as either being caused by lack of reliability or lack of robustness. Differentiating the causes of failure in this way allows us to formally define why models fail from first principles and tie these definitions to engineering concepts and real-world deployment settings. Throughout the document we provide 1) a summary of important theoretic concepts in reliability and robustness, 2) a sampling current techniques that practitioners can utilize to reason about ML model reliability and robustness, and 3) examples that show how these concepts and techniques can apply to real-world settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes