LGMay 26, 2023

Theoretical and Practical Perspectives on what Influence Functions Do

arXiv:2305.16971v133 citations
Originality Incremental advance
AI Analysis

This work addresses the practical utility of influence functions for researchers and practitioners in machine learning, particularly in NLP and computer vision, by clarifying theoretical expectations and demonstrating incremental improvements in model debugging.

The paper investigates the mismatch between the theoretical promise of influence functions (IF) for explaining model predictions and their poor empirical performance in predicting leave-one-out-and-retrain effects, identifying parameter divergence as a key limitation that reduces predictive power over training time, while still showing that IF can be useful for model debugging and correcting mis-predictions with a few fine-tuning steps.

Influence functions (IF) have been seen as a technique for explaining model predictions through the lens of the training data. Their utility is assumed to be in identifying training examples "responsible" for a prediction so that, for example, correcting a prediction is possible by intervening on those examples (removing or editing them) and retraining the model. However, recent empirical studies have shown that the existing methods of estimating IF predict the leave-one-out-and-retrain effect poorly. In order to understand the mismatch between the theoretical promise and the practical results, we analyse five assumptions made by IF methods which are problematic for modern-scale deep neural networks and which concern convexity, numeric stability, training trajectory and parameter divergence. This allows us to clarify what can be expected theoretically from IF. We show that while most assumptions can be addressed successfully, the parameter divergence poses a clear limitation on the predictive power of IF: influence fades over training time even with deterministic training. We illustrate this theoretical result with BERT and ResNet models. Another conclusion from the theoretical analysis is that IF are still useful for model debugging and correcting even though some of the assumptions made in prior work do not hold: using natural language processing and computer vision tasks, we verify that mis-predictions can be successfully corrected by taking only a few fine-tuning steps on influential examples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes