CRAIDec 14, 2025

CODE ACROSTIC: Robust Watermarking for Code Generation

arXiv:2512.14753v1
Originality Incremental advance
AI Analysis

This addresses the need for robust intellectual property protection in AI-generated code, though it appears incremental by improving upon existing watermarking methods.

The paper tackles the problem of watermarking LLM-generated code being vulnerable to comment removal attacks, and proposes a method using a Cue List to distinguish low- and high-entropy parts, achieving higher detectability and usability than existing techniques as evaluated on HumanEval.

Watermarking large language models (LLMs) is vital for preventing their misuse, including the fabrication of fake news, plagiarism, and spam. It is especially important to watermark LLM-generated code, as it often contains intellectual property.However, we found that existing methods for watermarking LLM-generated code fail to address comment removal attack.In such cases, an attacker can simply remove the comments from the generated code without affecting its functionality, significantly reducing the effectiveness of current code-watermarking techniques.On the other hand, injecting a watermark into code is challenging because, as previous works have noted, most code represents a low-entropy scenario compared to natural language. Our approach to addressing this issue involves leveraging prior knowledge to distinguish between low-entropy and high-entropy parts of the code, as indicated by a Cue List of words.We then inject the watermark guided by this Cue List, achieving higher detectability and usability than existing methods.We evaluated our proposed method on HumanEvaland compared our method with three state-of-the-art code watermarking techniques. The results demonstrate the effectiveness of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes