One Permutation Is All You Need: Fast, Reliable Variable Importance and Model Stress-Testing
This work addresses the need for reliable and efficient feature importance estimation in machine learning, particularly for trust, transparency, and regulatory compliance in applications like finance and credit risk, offering incremental improvements over existing methods.
The paper tackles the computational inefficiency and instability of permutation-based feature importance methods by proposing a single deterministic permutation approach, achieving faster and more stable results across nearly 200 scenarios with improved bias-variance tradeoffs. It also introduces Systemic Variable Importance for model stress-testing, which accounts for feature correlations to reveal hidden dependencies, as demonstrated in real-world case studies for fairness and risk assessment.
Reliable estimation of feature contributions in machine learning models is essential for trust, transparency and regulatory compliance, especially when models are proprietary or otherwise operate as black boxes. While permutation-based methods are a standard tool for this task, classical implementations rely on repeated random permutations, introducing computational overhead and stochastic instability. In this paper, we show that by replacing multiple random permutations with a single, deterministic, and optimal permutation, we achieve a method that retains the core principles of permutation-based importance while being non-random, faster, and more stable. We validate this approach across nearly 200 scenarios, including real-world household finance and credit risk applications, demonstrating improved bias-variance tradeoffs and accuracy in challenging regimes such as small sample sizes, high dimensionality, and low signal-to-noise ratios. Finally, we introduce Systemic Variable Importance, a natural extension designed for model stress-testing that explicitly accounts for feature correlations. This framework provides a transparent way to quantify how shocks or perturbations propagate through correlated inputs, revealing dependencies that standard variable importance measures miss. Two real-world case studies demonstrate how this metric can be used to audit models for hidden reliance on protected attributes (e.g., gender or race), enabling regulators and practitioners to assess fairness and systemic risk in a principled and computationally efficient manner.