CLFeb 2

Large Language Models for Mental Health: A Multilingual Evaluation

arXiv:2602.02440v1h-index: 11Has Code
AI Analysis

This work addresses the problem of assessing LLM effectiveness in multilingual mental health applications for researchers and practitioners, though it is incremental as it focuses on evaluation rather than new methods.

The paper evaluated large language models (LLMs) on multilingual mental health datasets, finding that proprietary and fine-tuned open-source LLMs achieved competitive F1 scores, often surpassing state-of-the-art results, but performance declined on machine-translated data with variations by language and typology.

Large Language Models (LLMs) have remarkable capabilities across NLP tasks. However, their performance in multilingual contexts, especially within the mental health domain, has not been thoroughly explored. In this paper, we evaluate proprietary and open-source LLMs on eight mental health datasets in various languages, as well as their machine-translated (MT) counterparts. We compare LLM performance in zero-shot, few-shot, and fine-tuned settings against conventional NLP baselines that do not employ LLMs. In addition, we assess translation quality across language families and typologies to understand its influence on LLM performance. Proprietary LLMs and fine-tuned open-source LLMs achieve competitive F1 scores on several datasets, often surpassing state-of-the-art results. However, performance on MT data is generally lower, and the extent of this decline varies by language and typology. This variation highlights both the strengths of LLMs in handling mental health tasks in languages other than English and their limitations when translation quality introduces structural or lexical mismatches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes