Efficient Data-Dependent Learnability
This work addresses the computational cost bottleneck of the pNML learnability measure for machine learning practitioners, particularly for detecting out-of-distribution examples.
The paper proposes an approximation of the predictive normalized maximum likelihood (pNML) approach, which is a min-max optimal solution for batch learning. This approximation, based on influence functions, is shown to effectively detect out-of-distribution examples when applied to neural networks.
The predictive normalized maximum likelihood (pNML) approach has recently been proposed as the min-max optimal solution to the batch learning problem where both the training set and the test data feature are individuals, known sequences. This approach has yields a learnability measure that can also be interpreted as a stability measure. This measure has shown some potential in detecting out-of-distribution examples, yet it has considerable computational costs. In this project, we propose and analyze an approximation of the pNML, which is based on influence functions. Combining both theoretical analysis and experiments, we show that when applied to neural networks, this approximation can detect out-of-distribution examples effectively. We also compare its performance to that achieved by conducting a single gradient step for each possible label.