Policy Space Identification in Configurable Environments
This work addresses the challenge of understanding agent capabilities in configurable environments, which is incremental as it builds on existing frameworks like Configurable MDPs.
The paper tackles the problem of identifying which policy parameters a learning agent can control from demonstrations, introducing statistical testing methods and leveraging configurable environments to improve identification. Empirical results demonstrate the effectiveness of their rules in discrete and continuous domains.
We study the problem of identifying the policy space of a learning agent, having access to a set of demonstrations generated by its optimal policy. We introduce an approach based on statistical testing to identify the set of policy parameters the agent can control, within a larger parametric policy space. After presenting two identification rules (combinatorial and simplified), applicable under different assumptions on the policy space, we provide a probabilistic analysis of the simplified one in the case of linear policies belonging to the exponential family. To improve the performance of our identification rules, we frame the problem in the recently introduced framework of the Configurable Markov Decision Processes, exploiting the opportunity of configuring the environment to induce the agent revealing which parameters it can control. Finally, we provide an empirical evaluation, on both discrete and continuous domains, to prove the effectiveness of our identification rules.