CL LGJul 2, 2024

Evaluating the Robustness of Adverse Drug Event Classification Models Using Templates

Dorothea MacPhail, David Harbecke, Lisa Raithel, Sebastian Möller

arXiv:2407.02432v114.427 citationsh-index: 6Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for thorough evaluation in high-stakes medical applications like ADE detection from social media, though it is incremental as it focuses on evaluation rather than new detection methods.

The paper tackled the problem of evaluating the robustness of adverse drug event classification models by using hand-crafted templates to test capabilities like temporal order and negation, finding that models with similar test set performance varied significantly on these specific tasks.

An adverse drug effect (ADE) is any harmful event resulting from medical drug treatment. Despite their importance, ADEs are often under-reported in official channels. Some research has therefore turned to detecting discussions of ADEs in social media. Impressive results have been achieved in various attempts to detect ADEs. In a high-stakes domain such as medicine, however, an in-depth evaluation of a model's abilities is crucial. We address the issue of thorough performance evaluation in English-language ADE detection with hand-crafted templates for four capabilities: Temporal order, negation, sentiment, and beneficial effect. We find that models with similar performance on held-out test sets have varying results on these capabilities.

View on arXiv PDF Code

Similar