AISEMay 4

Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents

arXiv:2605.0315927.0
AI Analysis

This work addresses the challenge of validating sequential behavior in autonomous agents, offering a practical solution for domains like UI testing and robotics where traditional testing is costly.

The paper presents an algorithm that learns correct sequential behavior from just 2-10 passing execution traces and validates new executions, achieving high accuracy in detecting bugs with only 3 training traces.

As autonomous agents become increasingly sophisticated, validating their sequential behavior presents a significant challenge. Traditional testing approaches require manual specification, exact sequence matching, or thousands of training examples. We present a novel algorithm that automatically learns correct behavior from just 2-10 passing execution traces and validates new executions against this learned model. Our approach combines dominator analysis from compiler theory with multimodal large language model-powered semantic understanding to identify essential states and handle non-deterministic behavior. The system constructs a generalized ground truth model using Prefix Tree Acceptors, merges traces through multi-tiered equivalence detection, and validates new executions via topological subsequence matching. In controlled experiments, our system achieved high accuracy in detecting product bugs and false successes using only 3 training traces. This approach provides explainable validation results with coverage metrics and works across diverse domains including UI testing, code generation, and robotic processes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes