Empirically Verifying Hypotheses Using Reinforcement Learning
This addresses the challenge of automated hypothesis testing in AI systems, though it appears incremental as it builds on existing RL methods with a specific structural adaptation.
The paper tackles the problem of verifying hypotheses about world dynamics by formulating it as a reinforcement learning task, showing that RL agents can succeed by exploiting a factorization structure of hypotheses into triplets, with subsequent fine-tuning enabling verification of non-factorizable hypotheses.
This paper formulates hypothesis verification as an RL problem. Specifically, we aim to build an agent that, given a hypothesis about the dynamics of the world, can take actions to generate observations which can help predict whether the hypothesis is true or false. Existing RL algorithms fail to solve this task, even for simple environments. In order to train the agents, we exploit the underlying structure of many hypotheses, factorizing them as {pre-condition, action sequence, post-condition} triplets. By leveraging this structure we show that RL agents are able to succeed at the task. Furthermore, subsequent fine-tuning of the policies allows the agent to correctly verify hypotheses not amenable to the above factorization.