CL LGMar 17, 2024

Cheap Ways of Extracting Clinical Markers from Texts

Anastasia Sandu, Teodor Mihailescu, Sergiu Nisioi

arXiv:2403.11227v126.2104 citationsh-index: 7Has CodeCLPsych

Originality Synthesis-oriented

AI Analysis

This work addresses suicide risk evaluation in clinical settings, but it is incremental as it compares existing methods without introducing new techniques.

The paper tackled extracting clinical markers from texts for suicide risk assessment by comparing a traditional machine learning pipeline with a large language model (LLM) approach, finding that the LLM method was more resource-intensive but provided guided sequences for evidence synthesis.

This paper describes the work of the UniBuc Archaeology team for CLPsych's 2024 Shared Task, which involved finding evidence within the text supporting the assigned suicide risk level. Two types of evidence were required: highlights (extracting relevant spans within the text) and summaries (aggregating evidence into a synthesis). Our work focuses on evaluating Large Language Models (LLM) as opposed to an alternative method that is much more memory and resource efficient. The first approach employs a good old-fashioned machine learning (GOML) pipeline consisting of a tf-idf vectorizer with a logistic regression classifier, whose representative features are used to extract relevant highlights. The second, more resource intensive, uses an LLM for generating the summaries and is guided by chain-of-thought to provide sequences of text indicating clinical markers.

View on arXiv PDF Code

Similar