LGSep 25, 2024

Revisiting inverse Hessian vector products for calculating influence functions

arXiv:2409.17357v110.48 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This work addresses computational bottlenecks for researchers using influence functions in machine learning, but it is incremental as it refines an existing method rather than introducing a new paradigm.

The paper tackled the impracticality of the LiSSA algorithm for computing inverse Hessian-vector products in influence functions by showing that hyperparameters can be chosen based on Hessian spectral properties, finding that batch size requirements are mild for the models considered.

Influence functions are a popular tool for attributing a model's output to training data. The traditional approach relies on the calculation of inverse Hessian-vector products (iHVP), but the classical solver "Linear time Stochastic Second-order Algorithm" (LiSSA, Agarwal et al. (2017)) is often deemed impractical for large models due to expensive computation and hyperparameter tuning. We show that the three hyperparameters -- the scaling factor, the batch size, and the number of steps -- can be chosen depending on the spectral properties of the Hessian, particularly its trace and largest eigenvalue. By evaluating with random sketching (Swartworth and Woodruff, 2023), we find that the batch size has to be sufficiently large for LiSSA to converge; however, for all of the models we consider, the requirement is mild. We confirm our findings empirically by comparing to Proximal Bregman Retraining Functions (PBRF, Bae et al. (2022)). Finally, we discuss what role the inverse Hessian plays in calculating the influence.

View on arXiv PDF Code

Similar