Finding Most Influential Sets

arXiv:2606.0591924.4

Predicted impact top 43% in ML · last 90 daysOriginality Highly original

AI Analysis

It provides a computationally feasible method for a previously intractable combinatorial selection problem in causal inference and robust statistics.

The paper shows that for estimands with linear-fractional leave-set-out effects, identifying the most influential set reduces to a one-parameter sequence of top-k problems, enabling an O(n) algorithm with finite termination that recovers exact MIS previously computationally inaccessible.

Identifying most influential sets (MIS) - size-$k$ subsets whose removal maximally changes a target estimand - is typically infeasible because it requires searching over $\binom{n}{k}$ subsets. For estimands with linear-fractional leave-set-out effects, we show that MIS selection reduces to a one-parameter sequence of top-$k$ problems. Dinkelbach's method yields an algorithm with $\mathcal{O}(n)$ cost per iteration and finite termination. For fixed residualized inputs, the algorithm returns a globally optimal set for the univariate ratio objective, including the oracle-residualized partial linear model. With estimated nuisance functions, uniform denominator and generated-score stability imply approximation to the first-order oracle orthogonal-score objective; exact set recovery follows under a separation condition. Simulations and applications show that the method recovers exact MIS that were previously computationally inaccessible.

View on arXiv PDF

Similar