Towards Objective Metrics for Procedurally Generated Video Game Levels
This addresses the need for standardized, objective metrics to compare procedural content generation methods for video game developers and researchers, though it is incremental as it builds on existing simulation-based approaches.
The paper tackles the problem of evaluating procedurally generated video game levels by introducing two simulation-based metrics using an A* agent to measure diversity and difficulty in a game-independent way, showing that the diversity metric is more robust to level changes and measures playability factors, while the difficulty metric correlates with existing estimates in one domain but faces challenges in another.
With increasing interest in procedural content generation by academia and game developers alike, it is vital that different approaches can be compared fairly. However, evaluating procedurally generated video game levels is often difficult, due to the lack of standardised, game-independent metrics. In this paper, we introduce two simulation-based evaluation metrics that involve analysing the behaviour of an A* agent to measure the diversity and difficulty of generated levels in a general, game-independent manner. Diversity is calculated by comparing action trajectories from different levels using the edit distance, and difficulty is measured as how much exploration and expansion of the A* search tree is necessary before the agent can solve the level. We demonstrate that our diversity metric is more robust to changes in level size and representation than current methods and additionally measures factors that directly affect playability, instead of focusing on visual information. The difficulty metric shows promise, as it correlates with existing estimates of difficulty in one of the tested domains, but it does face some challenges in the other domain. Finally, to promote reproducibility, we publicly release our evaluation framework.