Unpredictability dissociates from structured control in language agents
For researchers building language agents, this work provides evidence that stochastic methods cannot replace structured control mechanisms, though the findings are limited to the specific agent family tested.
The paper tests whether stochastic sampling can substitute for structured control mechanisms in language agents, finding across extensive experiments that stochastic unpredictability does not reproduce structured, action-coupled control, with the structured agent outperforming stochastic variants in all datasets.
Unpredictable behavior is often taken as evidence of control, yet stochastic dispersion and structured action control need not coincide. This paper tests whether stochastic sampling can substitute for structured mechanisms that couple reasons, memory, self-state and inhibition to action selection in a language-agent implementation whose control components can be selectively disabled. In a seven-dataset baseline lesion matrix comprising 74,352 calls, the high-stochasticity comparator was more unpredictable than the structured-control variant in 7/7 datasets, whereas targeted reason and veto lesions reduced the expected structured-control profiles in 7/7 datasets each. In a matched-interface control spanning 26,946 generations, the structured agent maintained stronger action-field coupling than all stochastic, post-hoc, scrambled and verbosity controls across every dataset. The primary behavioral test removed free-form trace wording from the evaluation: 57,816 scored records showed the structured-control variant exceeding the high-stochasticity comparator or the reason/veto lesions in 7/7 datasets for all predefined behavioral components. Later open-weight runs extended the no-context controls to Qwen2.5 7B, 14B and 32B and to an independent Mistral-7B family across 20 task families and three agent scaffolds; no-fields, scrambled-context and distribution-matched controls failed to recover structured action control. A three-annotator blinded audit over 1,200 overlap items preserved high agreement. Strict entropy matching, strict token/compute matching and a formal counterfactual-flip stress test did not meet their gates and are treated as limitations. Stochastic unpredictability did not reproduce structured, action-coupled control in this implemented agent family.