Black-box tests for algorithmic stability
This work addresses the challenge of verifying stability properties for practitioners using complex algorithms, which is crucial for ensuring generalization and predictive inference, but it is incremental as it builds on existing stability theory with a new testing approach.
The paper tackles the problem of empirically assessing algorithmic stability for complex modern algorithms by proposing a formal statistical framework for black-box testing without assumptions on the algorithm or data distribution, establishing fundamental bounds on the ability of such tests to identify stability.
Algorithmic stability is a concept from learning theory that expresses the degree to which changes to the input data (e.g., removal of a single data point) may affect the outputs of a regression algorithm. Knowing an algorithm's stability properties is often useful for many downstream applications -- for example, stability is known to lead to desirable generalization properties and predictive inference guarantees. However, many modern algorithms currently used in practice are too complex for a theoretical analysis of their stability properties, and thus we can only attempt to establish these properties through an empirical exploration of the algorithm's behavior on various data sets. In this work, we lay out a formal statistical framework for this kind of "black-box testing" without any assumptions on the algorithm or the data distribution and establish fundamental bounds on the ability of any black-box test to identify algorithmic stability.