An Offline Time-aware Apprenticeship Learning Framework for Evolving Reward Functions
This addresses the challenge of offline learning with evolving rewards in domains like healthcare, though it appears incremental as it builds on existing apprenticeship learning methods.
The paper tackles the problem of evolving reward functions in apprenticeship learning for human-centric tasks like healthcare, proposing the THEMES framework and demonstrating significant performance improvements over state-of-the-art baselines in sepsis treatment.
Apprenticeship learning (AL) is a process of inducing effective decision-making policies via observing and imitating experts' demonstrations. Most existing AL approaches, however, are not designed to cope with the evolving reward functions commonly found in human-centric tasks such as healthcare, where offline learning is required. In this paper, we propose an offline Time-aware Hierarchical EM Energy-based Sub-trajectory (THEMES) AL framework to tackle the evolving reward functions in such tasks. The effectiveness of THEMES is evaluated via a challenging task -- sepsis treatment. The experimental results demonstrate that THEMES can significantly outperform competitive state-of-the-art baselines.