MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations
This provides a practical and generalizable solution for efficient counterfactual explanations in deep neural networks, though it is incremental as it builds on existing diffusion methods.
The paper tackled the problem of slow and imprecise diffusion-based visual counterfactual explanations by proposing MaskDiME, which achieved over 30x faster inference and state-of-the-art performance across five benchmark datasets.
Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual generation methods are often computationally expensive, slow to sample, and imprecise in localizing the modified regions. To address these limitations, we propose MaskDiME, a simple, fast, and effective diffusion framework that unifies semantic consistency and spatial precision through localized sampling. Our approach adaptively focuses on decision-relevant regions to achieve localized and semantically consistent counterfactual generation while preserving high image fidelity. Our training-free framework, MaskDiME, achieves over 30x faster inference than the baseline method and achieves comparable or state-of-the-art performance across five benchmark datasets spanning diverse visual domains, establishing a practical and generalizable solution for efficient counterfactual explanation.