Optimizing Token Choice for Code Watermarking: An RL Approach
This addresses the need for effective intellectual property protection in code generation, though it appears incremental as an adaptive method building on existing watermarking concepts.
The paper tackled the problem of watermarking LLM-generated code by introducing CodeTracer, a reinforcement learning framework that biases token choices to embed watermarks, resulting in significant superiority over state-of-the-art baselines in detectability and code functionality preservation.
Protecting intellectual property on LLM-generated code necessitates effective watermarking systems that can operate within code's highly structured, syntactically constrained nature. In this work, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning training paradigm. At its core, CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction. This strategy ensures that embedded watermarks maintain code functionality while exhibiting subtle yet statistically detectable deviations from typical token distributions. To facilitate policy learning, we devise a comprehensive reward system that seamlessly integrates execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards. Additionally, we employ Gumbel Top-k reparameterization to enable gradient-based optimization of discrete watermarking decisions. Extensive comparative evaluations demonstrate CodeTracer's significant superiority over state-of-the-art baselines in both watermark detectability and the preservation of generated code's functionality.