MAApr 9

Open-Ended Video Game Glitch Detection with Agentic Reasoning and Temporal Grounding

arXiv:2604.0781812.3h-index: 1
Predicted impact top 45% in MA · last 90 daysOriginality Incremental advance
AI Analysis

It addresses the problem of detecting and describing glitches in video games for AI and gaming researchers, representing a novel domain-specific task.

The paper tackles open-ended video game glitch detection by introducing VideoGlitchBench, a benchmark with 5,238 gameplay videos, and proposes GliDe, an agentic framework that achieves substantially stronger performance than baselines.

Open-ended video game glitch detection aims to identify glitches in gameplay videos, describe them in natural language, and localize when they occur. Unlike conventional game glitch understanding tasks which have largely been framed as image-level recognition or closed-form question answering, this task requires reasoning about game-specific dynamics such as mechanics, physics, rendering, animation, and expected state transitions directly over continuous gameplay videos and distinguishing true glitches from unusual but valid in-game events. To support this task, we introduce VideoGlitchBench, the first benchmark for open-ended video game glitch detection with temporal localization. VideoGlitchBench contains 5,238 gameplay videos from 120 games, each annotated with detailed glitch descriptions and precise temporal spans, enabling unified evaluation of semantic understanding and temporal grounding. We further propose GliDe, an agentic framework with three key components: a game-aware contextual memory for informed reasoning, a debate-based reflector for multi-perspective glitch detection and verification, and an event-level grounding module that recovers complete glitch intervals from fragmented temporal evidence. We also design a task-specific evaluation protocol that jointly measures semantic fidelity and temporal accuracy. Experiments show that this task remains highly challenging for current multimodal models, while GliDe achieves substantially stronger performance than corresponding vanilla model baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes