Finding Most Influential Sets

arXiv:2606.0591924.4
Predicted impact top 43% in ML · last 90 daysOriginality Highly original
AI Analysis

It provides a computationally feasible method for a previously intractable combinatorial selection problem in causal inference and robust statistics.

The paper shows that for estimands with linear-fractional leave-set-out effects, identifying the most influential set reduces to a one-parameter sequence of top-k problems, enabling an O(n) algorithm with finite termination that recovers exact MIS previously computationally inaccessible.

Identifying most influential sets (MIS) - size-$k$ subsets whose removal maximally changes a target estimand - is typically infeasible because it requires searching over $\binom{n}{k}$ subsets. For estimands with linear-fractional leave-set-out effects, we show that MIS selection reduces to a one-parameter sequence of top-$k$ problems. Dinkelbach's method yields an algorithm with $\mathcal{O}(n)$ cost per iteration and finite termination. For fixed residualized inputs, the algorithm returns a globally optimal set for the univariate ratio objective, including the oracle-residualized partial linear model. With estimated nuisance functions, uniform denominator and generated-score stability imply approximation to the first-order oracle orthogonal-score objective; exact set recovery follows under a separation condition. Simulations and applications show that the method recovers exact MIS that were previously computationally inaccessible.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes