CL LGFeb 10, 2024

DAEDRA: A language model for predicting outcomes in passive pharmacovigilance reporting

arXiv:2402.10951v14 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of analyzing diverse adverse event reports in pharmacovigilance for regulatory purposes, but it is incremental as it builds on existing language model approaches with modest gains.

The paper tackled the problem of detecting regulatory-relevant outcomes like mortality and hospitalizations in passive pharmacovigilance reports, which are difficult to analyze due to diverse sources, and achieved a small but significant improvement with increases of 1% in F1, 2.5% in precision, and 3.8% in recall.

Over the recent years, the emergence of large language models (LLMs) has given rise to a proliferation of domain-specific models that are intended to reflect the particularities of linguistic context and content as a correlate of the originating domain. This paper details the conception, design, training and evaluation of DAEDRA, a LLM designed to detect regulatory-relevant outcomes (mortality, ER attendance and hospitalisation) in adverse event reports elicited through passive reporting (PR). While PR is a highly cost-efficient way of eliciting information from a wide and diverse audience -- typically including not only physicians and healthcare providers but also patients, family members and other lay stakeholders --, this diversity makes PR corpora difficult to analyse. Generic language models may not capture the complex clinical dimensions while specific clinical or biomedical models may not perform well on lay reports. To evaluate the utility of a subdomain-specific language model, an adaptive training approach was adapted, wherein base language model candidates were evaluated on a subset of the corpus, and the best performer was trained on the entire corpus. This yielded a small but significant improvement in $F_1$ (+1%), precision (+2.5%) and recall (+3.8%), at a relatively low training cost and a single-day training time. Subdomain-specific LLMs continue to be viable options for better results when analysing highly specialised corpora.

View on arXiv PDF

Similar