Truly Batch Apprenticeship Learning with Deep Successor Features
This addresses batch apprenticeship learning for real-world applications like healthcare and finance where simulators are unavailable or data acquisition is costly.
The paper tackles the problem of apprenticeship learning from batch data without requiring dynamics models or additional data collection, introducing an algorithm that uses Deep Successor Feature Networks and a transition-regularized imitation network. It achieves superior results on control benchmarks and a clinical sepsis management task.
We introduce a novel apprenticeship learning algorithm to learn an expert's underlying reward structure in off-policy model-free \emph{batch} settings. Unlike existing methods that require a dynamics model or additional data acquisition for on-policy evaluation, our algorithm requires only the batch data of observed expert behavior. Such settings are common in real-world tasks---health care, finance or industrial processes ---where accurate simulators do not exist or data acquisition is costly. To address challenges in batch settings, we introduce Deep Successor Feature Networks(DSFN) that estimate feature expectations in an off-policy setting and a transition-regularized imitation network that produces a near-expert initial policy and an efficient feature representation. Our algorithm achieves superior results in batch settings on both control benchmarks and a vital clinical task of sepsis management in the Intensive Care Unit.