SEAICLMay 22

TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs

arXiv:2605.2407990.4
AI Analysis

For researchers and practitioners evaluating code LLMs, TRACER provides a more accurate and fine-grained method to detect data contamination, addressing a critical reliability issue.

The paper introduces TRACER, a semantic-aware framework for fine-grained code contamination detection in code LLMs, achieving an F1 score of 0.91 for fine-grained detection and 0.92 for binary detection, outperforming existing methods by 42%-217%.

Data contamination is a known threat to the reliability of model evaluation. However, it remains underexplored in code large language models (LLMs), where contamination often goes beyond exact duplication. We present TRACER, a semantic-aware framework for fine-grained code contamination detection. TRACER models contamination using three levels of semantic overlap - Functionally Identical, Nearly Identical, and Shared Logic - and detects them through a coarse-to-fine pipeline. We also introduce the first benchmark for fine-grained code contamination detection, spanning three widely used benchmarks and three representative post-training datasets. TRACER achieves strong and consistent performance across multiple LLM backbones, with GPT-5 reaching an F1 score of 0.91 in fine-grained detection. In the binary setting, TRACER attains an F1 of 0.92, outperforming existing methods by 42%-217%. We further conduct ablation studies and error analysis to assess the contributions of individual components in TRACER.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes