A Flexible Framework for Incorporating Patient Preferences Into Q-Learning
This work addresses a domain-specific problem for healthcare by enabling personalized treatment decisions with multiple outcomes, representing an incremental improvement over existing methods.
The authors tackled the problem of optimizing multivariate outcomes like treatment efficacy and side effects in dynamic treatment regimes by proposing Latent Utility Q-Learning (LUQ-Learning), which adapts Q-learning to incorporate patient preferences and achieves competitive performance in simulations for low back pain and schizophrenia trials.
In real-world healthcare settings, treatment decisions often involve optimizing for multivariate outcomes such as treatment efficacy and severity of side effects based on individual preferences. However, existing statistical methods for estimating dynamic treatment regimes (DTRs) usually assume a univariate outcome, and the few methods that deal with composite outcomes suffer from limitations such as restrictions to a single time point and limited theoretical guarantees. To address these limitations, we propose Latent Utility Q-Learning (LUQ-Learning), a latent model approach that adapts Q-learning to tackle the aforementioned difficulties. Our framework allows for an arbitrary finite number of decision points and outcomes, incorporates personal preferences, and achieves asymptotic performance guarantees with realistic assumptions. We conduct simulation experiments based on an ongoing trial for low back pain as well as a well-known trial for schizophrenia. In both settings, LUQ-Learning achieves highly competitive performance compared to alternative baselines.