CLMar 4
Evaluating Large Language Models' Responses to Sexual and Reproductive Health Queries in NepaliMedha Sharma, Supriya Khadka, Udit Chandra Aryal et al.
As Large Language Models (LLMs) become integrated into daily life, they are increasingly used for personal queries, including Sexual and Reproductive Health (SRH), allowing users to chat anonymously without fear of judgment. However, current evaluation methods primarily focus on accuracy, often for objective queries in high-resource languages, and lack criteria to assess usability and safety, especially for low-resource languages and culturally sensitive domains like SRH. This paper introduces LLM Evaluation Framework (LEAF), that conducts assessments across multiple criteria: accuracy, language, usability gaps (including relevance, adequacy, and cultural appropriateness), and safety gaps (safety, sensitivity, and confidentiality). Using the LEAF framework, we assessed 14K SRH queries in Nepali from over 9K users. Responses were manually annotated by SRH experts according to the framework. Results revealed that only 35.1% of the responses were "proper", meaning they were accurate, adequate and had no major usability or safety related gaps. Insights include differences in performance between ChatGPT versions, such as similar accuracy but varying usability and safety aspects. This evaluation highlights significant limitations of current LLMs and underscores the need for improvement. The LEAF Framework is adaptable across domains and languages, particularly where usability and safety are critical, offering a pathway to better address sensitive topics.
CVMay 29, 2025
Identification of Patterns of Cognitive Impairment for Early Detection of DementiaAnusha A. S., Uma Ranjan, Medha Sharma et al.
Early detection of dementia is crucial to devise effective interventions. Comprehensive cognitive tests, while being the most accurate means of diagnosis, are long and tedious, thus limiting their applicability to a large population, especially when periodic assessments are needed. The problem is compounded by the fact that people have differing patterns of cognitive impairment as they progress to different forms of dementia. This paper presents a novel scheme by which individual-specific patterns of impairment can be identified and used to devise personalized tests for periodic follow-up. Patterns of cognitive impairment are initially learned from a population cluster of combined normals and MCIs, using a set of standardized cognitive tests. Impairment patterns in the population are identified using a 2-step procedure involving an ensemble wrapper feature selection followed by cluster identification and analysis. These patterns have been shown to correspond to clinically accepted variants of MCI, a prodrome of dementia. The learned clusters of patterns can subsequently be used to identify the most likely route of cognitive impairment, even for pre-symptomatic and apparently normal people. Baseline data of 24,000 subjects from the NACC database was used for the study.