General Characterization of Agents by States they Visit
This work provides a more general and robust method for characterizing decision-making agents, which is significant for researchers and practitioners in reinforcement learning who need to analyze, compare, or manipulate agent behaviors.
The paper addresses the limitations of existing behavioural characterizations (BCs) for decision-making agents, particularly their reliance on actions and applicability constraints. It proposes a novel BC based on the states visited by policies, demonstrating its effectiveness in stochastic environments and its utility in analyzing training algorithms, novelty search, and trust-region policy optimization.
Behavioural characterizations (BCs) of decision-making agents, or their policies, are used to study outcomes of training algorithms and as part of the algorithms themselves to encourage unique policies, match expert policy or restrict changes to policy per update. However, previously presented solutions are not applicable in general, either due to lack of expressive power, computational constraint or constraints on the policy or environment. Furthermore, many BCs rely on the actions of policies. We discuss and demonstrate how these BCs can be misleading, especially in stochastic environments, and propose a novel solution based on what states policies visit. We run experiments to evaluate the quality of the proposed BC against baselines and evaluate their use in studying training algorithms, novelty search and trust-region policy optimization. The code is available at https://github.com/miffyli/policy-supervectors.