ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks
This addresses a standardization problem for researchers and developers in AI and gaming, though it is incremental as it builds on existing RAG concepts.
The paper tackles the lack of a dedicated benchmark for evaluating Retrieval Augmented Generation (RAG) systems in dynamic online gaming domains by introducing ChronoPlay, a framework that automatically generates continuous benchmarks, resulting in the first dynamic RAG benchmark for gaming with insights into model performance under realistic conditions.
Retrieval Augmented Generation (RAG) systems are increasingly vital in dynamic domains like online gaming, yet the lack of a dedicated benchmark has impeded standardized evaluation in this area. The core difficulty lies in Dual Dynamics: the constant interplay between game content updates and the shifting focus of the player community. Furthermore, the necessity of automating such a benchmark introduces a critical requirement for player-centric authenticity to ensure generated questions are realistic. To address this integrated challenge, we introduce ChronoPlay, a novel framework for the automated and continuous generation of game RAG benchmarks. ChronoPlay utilizes a dual-dynamic update mechanism to track both forms of change, and a dual-source synthesis engine that draws from official sources and player community to ensure both factual correctness and authentic query patterns. We instantiate our framework on three distinct games to create the first dynamic RAG benchmark for the gaming domain, offering new insights into model performance under these complex and realistic conditions. Code is avaliable at: https://github.com/hly1998/ChronoPlay.