LG AIMay 15

Interaction-Aware Influence Functions for Group Attribution

Jaeseung Heo, Kyeongheung Yun, Youngbin Choi, Sehyun Hwang, Jungseul Ok, Dongwoo Kim

arXiv:2605.1567571.8

Predicted impact top 23% in LG · last 90 daysOriginality Highly original

AI Analysis

For practitioners using influence functions for data attribution or selection, this provides a more accurate method that accounts for example interactions, with demonstrated gains in both model analysis and instruction tuning.

Influence functions for groups of examples typically sum individual influences, missing interactions. The authors propose a second-order interaction-aware influence function that captures pairwise effects, outperforming first-order methods in tracking leave-group-out retraining across six models and improving instruction-tuning data selection for Llama-3.1-8B on five of seven tasks.

Influence functions approximate how removing a training example changes a quantity of interest, called the target function, such as a held-out loss. To estimate the influence of a group of examples, the standard practice is to sum the individual influences of its members. However, this sum does not capture how examples jointly affect the target: a pair of examples may be redundant or complementary, but the sum cannot distinguish these cases. We propose an interaction-aware influence function that characterizes how interactions between examples influence the target. By expanding the target to second order around the trained parameters, we obtain an estimator that augments the standard sum with a pairwise interaction term that captures the alignment between two examples' effects on the target. We empirically evaluate our estimator in two settings. First, on six dataset-model pairs spanning logistic regression, MLPs, and ResNet-9, our estimator tracks leave-group-out retraining substantially better than first-order influence across all settings. Second, when used as a greedy selection rule for instruction-tuning data on Llama-3.1-8B, it beats prior influence-based and representation-similarity baselines on five of seven downstream tasks, in a regime where standard influence-based selection underperforms random selection.

View on arXiv PDF

Similar