When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State
For multi-agent systems with hidden competitor state, this work provides a diagnostic evaluation paradigm to detect when outcome metrics mask behavioral failures, addressing a critical gap in safety and deployability.
The paper identifies that outcome-only evaluation can certify unsafe agents, e.g., in hotel pricing where a policy meets revenue KPIs but violates rate discipline. It introduces trace-based evaluation (discipline stability) and shows across benchmarks that reward-only PPO fails trace alignment, while trace-prior policies better preserve behavioral distributions.
Outcome-only evaluation can certify economically unsafe agents: a policy can hit a business KPI while violating deployable behavioral discipline. In hotel pricing with hidden competitor state, a learner can achieve plausible revenue per available room while failing to preserve the rate discipline of a rule-based revenue-management competitor. We introduce discipline stability, a trace-based evaluation paradigm: define the benchmark behavior, restrict observations to the deployment regime, induce trace diagnostics from failure, separate mechanisms with ablations, and test transfer and deployment. Across a two-hotel benchmark and a compact hidden-budget bidding task, reward-only PPO variants miss trace alignment; revealing hidden state reduces label uncertainty; deterministic copy collapses uncertainty; and trace-prior or corrected history policies better preserve price or bid distributions. Pure behavior cloning is nearly enough for symmetric imitation, while Trace-Prior RL adds bounded adaptation under capacity asymmetry. The contribution is an evaluation and benchmark paradigm, not a new optimizer or a universal claim about MARL