MLLGJan 6, 2025

Group Shapley with Robust Significance Testing and Its Application to Bond Recovery Rate Prediction

arXiv:2501.03041v12 citationsh-index: 2
AI Analysis

This work addresses the need for explainable AI in business and economic contexts by providing a method to assess feature group importance, though it is incremental as it builds on existing Shapley value frameworks.

The authors tackled the problem of evaluating feature group importance in structured data by proposing Group Shapley, a metric that extends individual Shapley values, and developed a robust significance testing procedure with a three-cumulant chi-square approximation, which outperformed alternatives like the Wald test in simulations. They applied this to bond recovery rate prediction using a dataset of 2,094 observations and 98 features, identifying market-related variables as the most influential group and showing more equitable importance assignment via Lorenz curves and Gini indices.

We propose Group Shapley, a metric that extends the classical individual-level Shapley value framework to evaluate the importance of feature groups, addressing the structured nature of predictors commonly found in business and economic data. More importantly, we develop a significance testing procedure based on a three-cumulant chi-square approximation and establish the asymptotic properties of the test statistics for Group Shapley values. Our approach can effectively handle challenging scenarios, including sparse or skewed distributions and small sample sizes, outperforming alternative tests such as the Wald test. Simulations confirm that the proposed test maintains robust empirical size and demonstrates enhanced power under diverse conditions. To illustrate the method's practical relevance in advancing Explainable AI, we apply our framework to bond recovery rate predictions using a global dataset (1996-2023) comprising 2,094 observations and 98 features, grouped into 16 subgroups and five broader categories: bond characteristics, firm fundamentals, industry-specific factors, market-related variables, and macroeconomic indicators. Our results identify the market-related variables group as the most influential. Furthermore, Lorenz curves and Gini indices reveal that Group Shapley assigns feature importance more equitably compared to individual Shapley values.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes