Provably Correct Automata Embeddings for Optimal Automata-Conditioned Reinforcement Learning
This work addresses a theoretical gap for researchers and practitioners in reinforcement learning, offering provable correctness for multi-task learning with automata, though it is incremental as it builds on existing automata-conditioned RL methods.
The paper tackles the lack of theoretical guarantees in automata-conditioned reinforcement learning by providing a framework that proves it is probably approximately correct learnable, and presents a technique for learning provably correct automata embeddings that guarantee optimal multi-task policy learning, with experimental validation.
Automata-conditioned reinforcement learning (RL) has given promising results for learning multi-task policies capable of performing temporally extended objectives given at runtime, done by pretraining and freezing automata embeddings prior to training the downstream policy. However, no theoretical guarantees were given. This work provides a theoretical framework for the automata-conditioned RL problem and shows that it is probably approximately correct learnable. We then present a technique for learning provably correct automata embeddings, guaranteeing optimal multi-task policy learning. Our experimental evaluation confirms these theoretical results.