SELGOct 15, 2025

Signature in Code Backdoor Detection, how far are we?

arXiv:2510.13992v1h-index: 12
Originality Incremental advance
AI Analysis

This work addresses the challenge of backdoor detection in code models for software security, but it is incremental as it builds on existing Spectral Signature methods.

The paper investigated the effectiveness of Spectral Signature defenses for detecting backdoor attacks in code models, finding that standard settings are often suboptimal and proposing a new proxy metric to better estimate performance without retraining.

As Large Language Models (LLMs) become increasingly integrated into software development workflows, they also become prime targets for adversarial attacks. Among these, backdoor attacks are a significant threat, allowing attackers to manipulate model outputs through hidden triggers embedded in training data. Detecting such backdoors remains a challenge, and one promising approach is the use of Spectral Signature defense methods that identify poisoned data by analyzing feature representations through eigenvectors. While some prior works have explored Spectral Signatures for backdoor detection in neural networks, recent studies suggest that these methods may not be optimally effective for code models. In this paper, we revisit the applicability of Spectral Signature-based defenses in the context of backdoor attacks on code models. We systematically evaluate their effectiveness under various attack scenarios and defense configurations, analyzing their strengths and limitations. We found that the widely used setting of Spectral Signature in code backdoor detection is often suboptimal. Hence, we explored the impact of different settings of the key factors. We discovered a new proxy metric that can more accurately estimate the actual performance of Spectral Signature without model retraining after the defense.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes