Jorge Dueñas-Lerín

IRAug 3, 2023

Incorporating Recklessness to Collaborative Filtering based Recommender Systems

Diego Pérez-López, Fernando Ortega, Ángel González-Prieto et al.

Recommender systems are intrinsically tied to a reliability/coverage dilemma: The more reliable we desire the forecasts, the more conservative the decision will be and thus, the fewer items will be recommended. This causes a detriment to the predictive capability of the system, as it is only able to estimate potential interest in items for which there is a consensus in their evaluation, rather than being able to estimate potential interest in any item. In this paper, we propose the inclusion of a new term in the learning process of matrix factorization-based recommender systems, called recklessness, that takes into account the variance of the output probability distribution of the predicted ratings. In this way, gauging this recklessness measure we can force more spiky output distribution, enabling the control of the risk level desired when making decisions about the reliability of a prediction. Experimental results demonstrate that recklessness not only allows for risk regulation but also improves the quantity and quality of predictions provided by the recommender system.

80.3CLMay 20

Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models

Fernando Ortega, Raúl Lara-Cabrera, Jorge Dueñas-Lerín et al.

Mental health has become a global priority, leading to a massive administrative burden in the coding of clinical diagnoses. This study proposes the automation of psychiatric diagnostic analysis by mapping free-text descriptions to the International Classification of Diseases (ICD) using Natural Language Processing (NLP) and Machine Learning (ML) techniques. Utilizing a specialized dataset of 145,513 Spanish psychiatric descriptions, various text representation paradigms were evaluated, ranging from classical frequency-based models (BoW, TF-IDF) to state-of-the-art Large Language Models (LLMs) such as e5\_large, BioLORD, and Llama-3-8B. Results indicate that transformer-based embeddings consistently outperform traditional methods by capturing implicit semantic cues and nuanced medical terminology. The e5\_large model, through end-to-end fine-tuning, achieved the highest performance with a $F1_{micro}$ score of 0.866. This research demonstrates that adapting LLMs to specific clinical nomenclature is essential for overcoming the challenges of ``long-tail'' label distributions and the inherent ambiguity of psychiatric discourse.

Jorge Dueñas-Lerín

2 Papers