Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires
This work addresses the challenge of deploying automated depression detection models in real-world healthcare by improving generalizability and interpretability, though it is incremental as it builds on existing methods with clinical constraints.
The paper tackled the problem of poor out-of-domain generalization and lack of trust in black box models for depression detection by grounding predictions in symptoms from the PHQ9 clinical questionnaire, resulting in substantial improvements in generalization to out-of-distribution data across three social media datasets compared to a standard BERT-based approach.
Automated methods have been widely used to identify and analyze mental health conditions (e.g., depression) from various sources of information, including social media. Yet, deployment of such models in real-world healthcare applications faces challenges including poor out-of-domain generalization and lack of trust in black box models. In this work, we propose approaches for depression detection that are constrained to different degrees by the presence of symptoms described in PHQ9, a questionnaire used by clinicians in the depression screening process. In dataset-transfer experiments on three social media datasets, we find that grounding the model in PHQ9's symptoms substantially improves its ability to generalize to out-of-distribution data compared to a standard BERT-based approach. Furthermore, this approach can still perform competitively on in-domain data. These results and our qualitative analyses suggest that grounding model predictions in clinically-relevant symptoms can improve generalizability while producing a model that is easier to inspect.