Bárbara Malcorra

CL
h-index24
3papers
5citations
Novelty42%
AI Score32

3 Papers

CLDec 15, 2025
Beyond surface form: A pipeline for semantic analysis in Alzheimer's Disease detection from spontaneous speech

Dylan Phelps, Rodrigo Wilkens, Edward Gow-Smith et al.

Alzheimer's Disease (AD) is a progressive neurodegenerative condition that adversely affects cognitive abilities. Language-related changes can be automatically identified through the analysis of outputs from linguistic assessment tasks, such as picture description. Language models show promise as a basis for screening tools for AD, but their limited interpretability poses a challenge in distinguishing true linguistic markers of cognitive decline from surface-level textual patterns. To address this issue, we examine how surface form variation affects classification performance, with the goal of assessing the ability of language models to represent underlying semantic indicators. We introduce a novel approach where texts surface forms are transformed by altering syntax and vocabulary while preserving semantic content. The transformations significantly modify the structure and lexical content, as indicated by low BLEU and chrF scores, yet retain the underlying semantics, as reflected in high semantic similarity scores, isolating the effect of semantic information, and finding models perform similarly to if they were using the original text, with only small deviations in macro-F1. We also investigate whether language from picture descriptions retains enough detail to reconstruct the original image using generative models. We found that image-based transformations add substantial noise reducing classification accuracy. Our methodology provides a novel way of looking at what features influence model predictions, and allows the removal of possible spurious correlations. We find that just using semantic information, language model based classifiers can still detect AD. This work shows that difficult to detect semantic impairment can be identified, addressing an overlooked feature of linguistic deterioration, and opening new pathways for early detection systems.

CLSep 30, 2024
A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification

Marina Ribeiro, Bárbara Malcorra, Natália B. Mota et al.

Neurological disorders that affect speech production, such as Alzheimer's Disease (AD), significantly impact the lives of both patients and caregivers, whether through social, psycho-emotional effects or other aspects not yet fully understood. Recent advancements in Large Language Model (LLM) architectures have developed many tools to identify representative features of neurological disorders through spontaneous speech. However, LLMs typically lack interpretability, meaning they do not provide clear and specific reasons for their decisions. Therefore, there is a need for methods capable of identifying the representative features of neurological disorders in speech and explaining clearly why these features are relevant. This paper presents an explainable LLM method, named SLIME (Statistical and Linguistic Insights for Model Explanation), capable of identifying lexical components representative of AD and indicating which components are most important for the LLM's decision. In developing this method, we used an English-language dataset consisting of transcriptions from the Cookie Theft picture description task. The LLM Bidirectional Encoder Representations from Transformers (BERT) classified the textual descriptions as either AD or control groups. To identify representative lexical features and determine which are most relevant to the model's decision, we used a pipeline involving Integrated Gradients (IG), Linguistic Inquiry and Word Count (LIWC), and statistical analysis. Our method demonstrates that BERT leverages lexical components that reflect a reduction in social references in AD and identifies which further improve the LLM's accuracy. Thus, we provide an explainability tool that enhances confidence in applying LLMs to neurological clinical contexts, particularly in the study of neurodegeneration.

CLFeb 10, 2025
"Once Upon a Time..." Literary Narrative Connectedness Progresses with Grade Level: Potential Impact on Reading Fluency and Literacy Skills

Marina Ribeiro, Bárbara Malcorra, Diego Pintor et al.

Selecting an appropriate book is crucial for fostering reading habits in children. While children exhibit varying levels of complexity when generating oral narratives, the question arises: do children's books also differ in narrative complexity? This study explores the narrative dynamics of literary texts used in schools, focusing on how their complexity evolves across different grade levels. Using Word-Recurrence Graph Analysis, we examined a dataset of 1,627 literary texts spanning 13 years of education. The findings reveal significant exponential growth in connectedness, particularly during the first three years of schooling, mirroring patterns observed in children's oral narratives. These results highlight the potential of literary texts as a tool to support the development of literacy skills.