Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games
It addresses the problem of detecting unauthorized AI use in gaming platforms (e.g., online chess) for platform operators, but the adaptation is incremental.
This paper adapts the KGW watermarking technique from LLMs to watermark game-playing agents in perfect-information extensive-form games, enabling detection of unauthorized AI use. Experiments on chess engines show negligible impact on strategy quality and detectability within a few games.
Watermarking techniques for large language models (LLMs), which encode hidden information in the output so its source can be verified, have gained significant attention in recent days, thanks to their potential capability to detect accidental or deliberate misuse. Similar challenges involving model misuse also exist in the context of game-playing, such as when detecting the unauthorized use of AI tools in gaming platforms (e.g., cheating in online chess). In this paper, we initiate the study of how game-playing strategies can be watermarked. We show how the KGW watermark for LLMs can be adapted to watermark game-playing agents in perfect-information extensive-form games. The watermark can then be detected using a statistical test. We show that the degradation in the quality of the watermarked strategy profile, quantified by the expected utility, can be bounded, but there is a tradeoff between detectability and quality. In our experiments, we bootstrap the watermarking framework to various chess engines and demonstrate that a) the impact of the watermark on the quality of the strategy is negligible and b) the watermark can be detected with just a handful of games.