Statistical and Computational Guarantees for Influence Diagnostics
This work provides theoretical guarantees for influence diagnostics, which are important for identifying influential data points in machine learning and AI applications, but it is incremental as it builds on existing methods.
The paper establishes finite-sample statistical bounds and computational complexity bounds for influence diagnostics, such as influence functions and approximate maximum influence perturbations, using efficient inverse-Hessian-vector product implementations, with results illustrated on generalized linear models and large attention-based models.
Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets of datapoints. We establish finite-sample statistical bounds, as well as computational complexity bounds, for influence functions and approximate maximum influence perturbations using efficient inverse-Hessian-vector product implementations. We illustrate our results with generalized linear models and large attention based models on synthetic and real data.