On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective
This work highlights a critical pitfall in a widely used benchmarking method for explainable deep learning, warning researchers against its indiscriminate use.
The paper identifies a flaw in the RemOve-And-Retrain (ROAR) procedure for evaluating feature importance approximations, showing that attributions with less information can perform better in ROAR benchmarks, which contradicts its intended purpose, and this issue also affects the variant RemOve-And-Debias (ROAD).
Approaches for appraising feature importance approximations, alternatively referred to as attribution methods, have been established across an extensive array of contexts. The development of resilient techniques for performance benchmarking constitutes a critical concern in the sphere of explainable deep learning. This study scrutinizes the dependability of the RemOve-And-Retrain (ROAR) procedure, which is prevalently employed for gauging the performance of feature importance estimates. The insights gleaned from our theoretical foundation and empirical investigations reveal that attributions containing lesser information about the decision function may yield superior results in ROAR benchmarks, contradicting the original intent of ROAR. This occurrence is similarly observed in the recently introduced variant RemOve-And-Debias (ROAD), and we posit a persistent pattern of blurriness bias in ROAR attribution metrics. Our findings serve as a warning against indiscriminate use on ROAR metrics.