AIAug 25, 2025

FAIRGAMER: Evaluating Biases in the Application of Large Language Models to Video Games

Bingkang Shi, Jen-tse Huang, Guoyi Li, Xiaodan Zhang, Zhongjiang Yao

arXiv:2508.17825v12 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This work addresses reliability issues for game developers and players by exposing critical biases in LLMs used for gaming applications, though it is incremental as it focuses on evaluation rather than mitigation.

The paper tackles the problem of social biases in large language models (LLMs) applied to video games, revealing that these biases degrade game balance, with Grok-3 showing an average D_lstd score of 0.431 as the most severe case.

Leveraging their advanced capabilities, Large Language Models (LLMs) demonstrate vast application potential in video games--from dynamic scene generation and intelligent NPC interactions to adaptive opponents--replacing or enhancing traditional game mechanics. However, LLMs' trustworthiness in this application has not been sufficiently explored. In this paper, we reveal that the models' inherent social biases can directly damage game balance in real-world gaming environments. To this end, we present FairGamer, the first bias evaluation Benchmark for LLMs in video game scenarios, featuring six tasks and a novel metrics ${D_lstd}$. It covers three key scenarios in games where LLMs' social biases are particularly likely to manifest: Serving as Non-Player Characters, Interacting as Competitive Opponents, and Generating Game Scenes. FairGamer utilizes both reality-grounded and fully fictional game content, covering a variety of video game genres. Experiments reveal: (1) Decision biases directly cause game balance degradation, with Grok-3 (average ${D_lstd}$ score=0.431) exhibiting the most severe degradation; (2) LLMs demonstrate isomorphic social/cultural biases toward both real and virtual world content, suggesting their biases nature may stem from inherent model characteristics. These findings expose critical reliability gaps in LLMs' gaming applications. Our code and data are available at anonymous GitHub https://github.com/Anonymous999-xxx/FairGamer .

View on arXiv PDF Code

Similar