CVCRLGJun 29, 2025

Learning Counterfactually Decoupled Attention for Open-World Model Attribution

arXiv:2506.23074v13 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This addresses model attribution in open-world scenarios for AI security, but it is incremental as it builds on existing benchmarks and methods.

The paper tackles the problem of open-world model attribution, which struggles with spurious correlations and novel attacks, by proposing a counterfactually decoupled attention learning method that improves state-of-the-art models with minimal computational overhead, particularly for unseen attacks.

In this paper, we propose a Counterfactually Decoupled Attention Learning (CDAL) method for open-world model attribution. Existing methods rely on handcrafted design of region partitioning or feature space, which could be confounded by the spurious statistical correlations and struggle with novel attacks in open-world scenarios. To address this, CDAL explicitly models the causal relationships between the attentional visual traces and source model attribution, and counterfactually decouples the discriminative model-specific artifacts from confounding source biases for comparison. In this way, the resulting causal effect provides a quantification on the quality of learned attention maps, thus encouraging the network to capture essential generation patterns that generalize to unseen source models by maximizing the effect. Extensive experiments on existing open-world model attribution benchmarks show that with minimal computational overhead, our method consistently improves state-of-the-art models by large margins, particularly for unseen novel attacks. Source code: https://github.com/yzheng97/CDAL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes