On the Identifiability of Latent Action Policies
This work addresses a theoretical gap in representation learning for reinforcement learning, but it is incremental as it builds on an existing LAPO framework.
The paper tackles the problem of identifiability in latent action policy learning (LAPO) from video data, proving that an entropy-regularized objective identifies action representations under certain conditions, which explains the practical success of discrete action representations.
We study the identifiability of latent action policy learning (LAPO), a framework introduced recently to discover representations of actions from video data. We formally describe desiderata for such representations, their statistical benefits and potential sources of unidentifiability. Finally, we prove that an entropy-regularized LAPO objective identifies action representations satisfying our desiderata, under suitable conditions. Our analysis provides an explanation for why discrete action representations perform well in practice.