CR AI CY GTAug 4, 2025

Can LLMs effectively provide game-theoretic-based scenarios for cybersecurity?

Daniele Proverbio, Alessio Buscemi, Alessandro Di Stefano, The Anh Han, German Castignani, Pietro Liò

arXiv:2508.05670v18.62 citationsh-index: 25Frontiers of Computer Science

Originality Incremental advance

AI Analysis

This work addresses the reliability of LLMs for cybersecurity simulations, highlighting risks in cross-lingual applications, but it is incremental as it builds on existing game-theoretic frameworks.

The study investigated whether large language models (LLMs) can effectively simulate game-theoretic scenarios in cybersecurity, finding that LLM-driven agents' payoffs are influenced by personality traits, knowledge of repeated interactions, and language choice, with deviations from expected outcomes across five languages.

Game theory has long served as a foundational tool in cybersecurity to test, predict, and design strategic interactions between attackers and defenders. The recent advent of Large Language Models (LLMs) offers new tools and challenges for the security of computer systems; In this work, we investigate whether classical game-theoretic frameworks can effectively capture the behaviours of LLM-driven actors and bots. Using a reproducible framework for game-theoretic LLM agents, we investigate two canonical scenarios -- the one-shot zero-sum game and the dynamic Prisoner's Dilemma -- and we test whether LLMs converge to expected outcomes or exhibit deviations due to embedded biases. Our experiments involve four state-of-the-art LLMs and span five natural languages, English, French, Arabic, Vietnamese, and Mandarin Chinese, to assess linguistic sensitivity. For both games, we observe that the final payoffs are influenced by agents characteristics such as personality traits or knowledge of repeated rounds. Moreover, we uncover an unexpected sensitivity of the final payoffs to the choice of languages, which should warn against indiscriminate application of LLMs in cybersecurity applications and call for in-depth studies, as LLMs may behave differently when deployed in different countries. We also employ quantitative metrics to evaluate the internal consistency and cross-language stability of LLM agents, to help guide the selection of the most stable LLMs and optimising models for secure applications.

View on arXiv PDF

Similar