Extending Kernel Trick to Influence Functions
This work addresses the computational bottleneck of influence functions for large models, offering a practical alternative for model debugging and data valuation in overparameterized settings.
The paper introduces a dual representation of influence functions that reduces computational complexity from model size to dataset size, enabling efficient estimation of parameter and output changes for large models when dataset size is smaller. Experiments show it matches original influence functions while being faster.
In this paper, we present a dual representation of the influence functions, whose computational complexity scales with dataset size rather than model size. Both analytically and experimentally, we show that this representation can be an efficient alternative to the original influence functions for estimating changes in parameters, model outputs and loss due to data point removal, when model size is large relative to dataset size, or when evaluating the original influence functions in parameter space is infeasible. The dual representation, however, is limited to linearizable models, which are models whose behavior can be approximated by their linearizations throughout training, and requires materializing a matrix, whose size grows with the product of model output dimension and dataset size.