David Tomás

h-index15

5papers

22citations

Novelty18%

AI Score22

Ranked #179,493 of 194,257 authors (top 92%)#38,507 in LG (top 96%)

5 Papers

9.6CLMay 28, 2025

NLP for Social Good: A Survey of Challenges, Opportunities, and Responsible Deployment

Antonia Karamolegkou, Angana Borah, Eunjung Cho et al.

Recent advancements in large language models (LLMs) have unlocked unprecedented possibilities across a range of applications. However, as a community, we believe that the field of Natural Language Processing (NLP) has a growing need to approach deployment with greater intentionality and responsibility. In alignment with the broader vision of AI for Social Good (Tomašev et al., 2020), this paper examines the role of NLP in addressing pressing societal challenges. Through a cross-disciplinary analysis of social goals and emerging risks, we highlight promising research directions and outline challenges that must be addressed to ensure responsible and equitable progress in NLP4SG research.

4.6LGOct 24, 2024

Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques

David Ortiz-Perez, Manuel Benavent-Lledo, Jose Garcia-Rodriguez et al.

Cognitive decline is a natural part of aging. However, under some circumstances, this decline is more pronounced than expected, typically due to disorders such as Alzheimer's disease. Early detection of an anomalous decline is crucial, as it can facilitate timely professional intervention. While medical data can help, it often involves invasive procedures. An alternative approach is to employ non-intrusive techniques such as speech or handwriting analysis, which do not disturb daily activities. This survey reviews the most relevant non-intrusive methodologies that use deep learning techniques to automate the cognitive decline detection task, including audio, text, and visual processing. We discuss the key features and advantages of each modality and methodology, including state-of-the-art approaches like Transformer architecture and foundation models. In addition, we present studies that integrate different modalities to develop multimodal models. We also highlight the most significant datasets and the quantitative results from studies using these resources. From this review, several conclusions emerge. In most cases, text-based approaches consistently outperform other modalities. Furthermore, combining various approaches from individual modalities into a multimodal model consistently enhances performance across nearly all scenarios.

13.0LGJun 2, 2025

CogniAlign: Word-Level Multimodal Speech Alignment with Gated Cross-Attention for Alzheimer's Detection

David Ortiz-Perez, Manuel Benavent-Lledo, Javier Rodriguez-Juan et al.

Early detection of cognitive disorders such as Alzheimer's disease is critical for enabling timely clinical intervention and improving patient outcomes. In this work, we introduce CogniAlign, a multimodal architecture for Alzheimer's detection that integrates audio and textual modalities, two non-intrusive sources of information that offer complementary insights into cognitive health. Unlike prior approaches that fuse modalities at a coarse level, CogniAlign leverages a word-level temporal alignment strategy that synchronizes audio embeddings with corresponding textual tokens based on transcription timestamps. This alignment supports the development of token-level fusion techniques, enabling more precise cross-modal interactions. To fully exploit this alignment, we propose a Gated Cross-Attention Fusion mechanism, where audio features attend over textual representations, guided by the superior unimodal performance of the text modality. In addition, we incorporate prosodic cues, specifically interword pauses, by inserting pause tokens into the text and generating audio embeddings for silent intervals, further enriching both streams. We evaluate CogniAlign on the ADReSSo dataset, where it achieves an accuracy of 87.35% over a Leave-One-Subject-Out setup and of 90.36% over a 5 fold Cross-Validation, outperforming existing state-of-the-art methods. A detailed ablation study confirms the advantages of our alignment strategy, attention-based fusion, and prosodic modeling. Finally, we perform a corpus analysis to assess the impact of the proposed prosodic features and apply Integrated Gradients to identify the most influential input segments used by the model in predicting cognitive health outcomes.

7.9LGJun 11, 2024Code

Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis

David Ortiz-Perez, Jose Garcia-Rodriguez, David Tomás

Cognitive decline is a natural process that occurs as individuals age. Early diagnosis of anomalous decline is crucial for initiating professional treatment that can enhance the quality of life of those affected. To address this issue, we propose a multimodal model capable of predicting Mild Cognitive Impairment and cognitive scores. The TAUKADIAL dataset is used to conduct the evaluation, which comprises audio recordings of clinical interviews. The proposed model demonstrates the ability to transcribe and differentiate between languages used in the interviews. Subsequently, the model extracts audio and text features, combining them into a multimodal architecture to achieve robust and generalized results. Our approach involves in-depth research to implement various features obtained from the proposed modalities.

2.6CVJul 8, 2021

Exploiting the relationship between visual and textual features in social networks for image classification with zero-shot deep learning

Luis Lucas, David Tomas, Jose Garcia-Rodriguez

One of the main issues related to unsupervised machine learning is the cost of processing and extracting useful information from large datasets. In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture in multimodal environments (image and text) from social media. For this purpose, we used the InstaNY100K dataset and proposed a validation approach based on sampling techniques. Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part, and then adding the associated texts as support. The results obtained demonstrated that trained neural networks such as CLIP can be successfully applied to image classification with little fine-tuning, and considering the associated texts to the images can help to improve the accuracy depending on the goal. The results demonstrated what seems to be a promising research direction.