Specification-Driven Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism
This addresses the need for adaptable and verifiable world models in domains like queueing systems and multi-agent coordination, offering a middle ground between hand-engineered and neural models, though it is incremental in combining existing formalisms with LLMs.
The paper tackles the problem of creating reliable and flexible world models for agentic systems by proposing a method that synthesizes discrete-event world models from natural-language specifications using the DEVS formalism and an LLM-based pipeline, resulting in models that are consistent over long horizons, verifiable, and efficient to generate online.
World models are essential for planning and evaluation in agentic systems, yet existing approaches lie at two extremes: hand-engineered simulators that offer consistency and reproducibility but are costly to adapt, and implicit neural models that are flexible but difficult to constrain, verify, and debug over long horizons. We seek a principled middle ground that combines the reliability of explicit simulators with the flexibility of learned models, allowing world models to be adapted during online execution. By targeting a broad class of environments whose dynamics are governed by the ordering, timing, and causality of discrete events, such as queueing and service operations, embodied task planning, and message-mediated multi-agent coordination, we advocate explicit, executable discrete-event world models synthesized directly from natural-language specifications. Our approach adopts the DEVS formalism and introduces a staged LLM-based generation pipeline that separates structural inference of component interactions from component-level event and timing logic. To evaluate generated models without a unique ground truth, simulators emit structured event traces that are validated against specification-derived temporal and semantic constraints, enabling reproducible verification and localized diagnostics. Together, these contributions produce world models that are consistent over long-horizon rollouts, verifiable from observable behavior, and efficient to synthesize on demand during online execution.