Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning (Extended Version)

Pierriccardo Olivieri, Fausto Lasca, Alessandro Gianola, Matteo Papini

arXiv:2602.06227v14.42 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses the challenge of reward specification in reinforcement learning for complex, unstructured domains, offering a more expressive and reusable framework, though it appears incremental as it builds on existing logical and HER methods.

The authors tackled the problem of specifying complex non-Markovian rewards in reinforcement learning by proposing a framework using Linear Temporal Logic Modulo Theories over finite traces (LTLfMT), which enables natural task specification over unstructured data without manual encoding. They addressed theoretical and computational challenges by identifying a tractable fragment and introduced a method combining reward machines with Hindsight Experience Replay (HER), showing its effectiveness in continuous-control settings.

In this work, we propose a novel framework for the logical specification of non-Markovian rewards in Markov Decision Processes (MDPs) with large state spaces. Our approach leverages Linear Temporal Logic Modulo Theories over finite traces (LTLfMT), a more expressive extension of classical temporal logic in which predicates are first-order formulas of arbitrary first-order theories rather than simple Boolean variables. This enhanced expressiveness enables the specification of complex tasks over unstructured and heterogeneous data domains, promoting a unified and reusable framework that eliminates the need for manual predicate encoding. However, the increased expressive power of LTLfMT introduces additional theoretical and computational challenges compared to standard LTLf specifications. We address these challenges from a theoretical standpoint, identifying a fragment of LTLfMT that is tractable but sufficiently expressive for reward specification in an infinite-state-space context. From a practical perspective, we introduce a method based on reward machines and Hindsight Experience Replay (HER) to translate first-order logic specifications and address reward sparsity. We evaluate this approach to a continuous-control setting using Non-Linear Arithmetic Theory, showing that it enables natural specification of complex tasks. Experimental results show how a tailored implementation of HER is fundamental in solving tasks with complex goals.

View on arXiv PDF

Similar