Giuseppe Riccio

CL
h-index9
10papers
64citations
Novelty34%
AI Score54

10 Papers

CLNov 5, 2025Code
SOLVE-Med: Specialized Orchestration for Leading Vertical Experts across Medical Specialties

Roberta Di Marino, Giovanni Dioguardi, Antonio Romano et al.

Medical question answering systems face deployment challenges including hallucinations, bias, computational demands, privacy concerns, and the need for specialized expertise across diverse domains. Here, we present SOLVE-Med, a multi-agent architecture combining domain-specialized small language models for complex medical queries. The system employs a Router Agent for dynamic specialist selection, ten specialized models (1B parameters each) fine-tuned on specific medical domains, and an Orchestrator Agent that synthesizes responses. Evaluated on Italian medical forum data across ten specialties, SOLVE-Med achieves superior performance with ROUGE-1 of 0.301 and BERTScore F1 of 0.697, outperforming standalone models up to 14B parameters while enabling local deployment. Our code is publicly available on GitHub: https://github.com/PRAISELab-PicusLab/SOLVE-Med.

CVFeb 25
Brain3D: Brain Report Automation via Inflated Vision Transformers in 3D

Mariano Barone, Francesco Di Serio, Giuseppe Riccio et al.

Current medical vision-language models (VLMs) process volumetric brain MRI using 2D slice-based approximations, fragmenting the spatial context required for accurate neuroradiological interpretation. We developed \textbf{Brain3D}, a staged vision-language framework for automated radiology report generation from 3D brain tumor MRI. Our approach inflates a pretrained 2D medical encoder into a native 3D architecture and progressively aligns it with a causal language model through three stages: contrastive grounding, supervised projector warmup, and LoRA-based linguistic specialization. Unlike generalist 3D medical VLMs, \textbf{Brain3D} is tailored to neuroradiology, where hemispheric laterality, tumor infiltration patterns, and anatomical localization are critical. Evaluated on 468 subjects (BraTS pathological cases plus healthy controls), our model achieves a Clinical Pathology F1 of 0.951 versus 0.413 for a strong 2D baseline while maintaining perfect specificity on healthy scans. The staged alignment proves essential: contrastive grounding establishes visual-textual correspondence, projector warmup stabilizes conditioning, and LoRA adaptation shifts output from verbose captions to structured clinical reports\footnote{Our code is publicly available for transparency and reproducibility

CLSep 17, 2025Code
Combating Biomedical Misinformation through Multi-modal Claim Detection and Evidence-based Verification

Mariano Barone, Antonio Romano, Giuseppe Riccio et al.

Misinformation in healthcare, from vaccine hesitancy to unproven treatments, poses risks to public health and trust in medical systems. While machine learning and natural language processing have advanced automated fact-checking, validating biomedical claims remains uniquely challenging due to complex terminology, the need for domain expertise, and the critical importance of grounding in scientific evidence. We introduce CER (Combining Evidence and Reasoning), a novel framework for biomedical fact-checking that integrates scientific evidence retrieval, reasoning via large language models, and supervised veracity prediction. By integrating the text-generation capabilities of large language models with advanced retrieval techniques for high-quality biomedical scientific evidence, CER effectively mitigates the risk of hallucinations, ensuring that generated outputs are grounded in verifiable, evidence-based sources. Evaluations on expert-annotated datasets (HealthFC, BioASQ-7b, SciFact) demonstrate state-of-the-art performance and promising cross-dataset generalization. Code and data are released for transparency and reproducibility: https://github.com/PRAISELab-PicusLab/CER

CLOct 21, 2025Code
DART: A Structured Dataset of Regulatory Drug Documents in Italian for Clinical NLP

Mariano Barone, Antonio Laudante, Giuseppe Riccio et al.

The extraction of pharmacological knowledge from regulatory documents has become a key focus in biomedical natural language processing, with applications ranging from adverse event monitoring to AI-assisted clinical decision support. However, research in this field has predominantly relied on English-language corpora such as DrugBank, leaving a significant gap in resources tailored to other healthcare systems. To address this limitation, we introduce DART (Drug Annotation from Regulatory Texts), the first structured corpus of Italian Summaries of Product Characteristics derived from the official repository of the Italian Medicines Agency (AIFA). The dataset was built through a reproducible pipeline encompassing web-scale document retrieval, semantic segmentation of regulatory sections, and clinical summarization using a few-shot-tuned large language model with low-temperature decoding. DART provides structured information on key pharmacological domains such as indications, adverse drug reactions, and drug-drug interactions. To validate its utility, we implemented an LLM-based drug interaction checker that leverages the dataset to infer clinically meaningful interactions. Experimental results show that instruction-tuned LLMs can accurately infer potential interactions and their clinical implications when grounded in the structured textual fields of DART. We publicly release our code on GitHub: https://github.com/PRAISELab-PicusLab/DART.

CLOct 21, 2025Code
IMB: An Italian Medical Benchmark for Question Answering

Antonio Romano, Giuseppe Riccio, Mariano Barone et al.

Online medical forums have long served as vital platforms where patients seek professional healthcare advice, generating vast amounts of valuable knowledge. However, the informal nature and linguistic complexity of forum interactions pose significant challenges for automated question answering systems, especially when dealing with non-English languages. We present two comprehensive Italian medical benchmarks: \textbf{IMB-QA}, containing 782,644 patient-doctor conversations from 77 medical categories, and \textbf{IMB-MCQA}, comprising 25,862 multiple-choice questions from medical specialty examinations. We demonstrate how Large Language Models (LLMs) can be leveraged to improve the clarity and consistency of medical forum data while retaining their original meaning and conversational style, and compare a variety of LLM architectures on both open and multiple-choice question answering tasks. Our experiments with Retrieval Augmented Generation (RAG) and domain-specific fine-tuning reveal that specialized adaptation strategies can outperform larger, general-purpose models in medical question answering tasks. These findings suggest that effective medical AI systems may benefit more from domain expertise and efficient information retrieval than from increased model scale. We release both datasets and evaluation frameworks in our GitHub repository to support further research on multilingual medical question answering: https://github.com/PRAISELab-PicusLab/IMB.

CLSep 17, 2025Code
Combining Evidence and Reasoning for Biomedical Fact-Checking

Mariano Barone, Antonio Romano, Giuseppe Riccio et al.

Misinformation in healthcare, from vaccine hesitancy to unproven treatments, poses risks to public health and trust in medical systems. While machine learning and natural language processing have advanced automated fact-checking, validating biomedical claims remains uniquely challenging due to complex terminology, the need for domain expertise, and the critical importance of grounding in scientific evidence. We introduce CER (Combining Evidence and Reasoning), a novel framework for biomedical fact-checking that integrates scientific evidence retrieval, reasoning via large language models, and supervised veracity prediction. By integrating the text-generation capabilities of large language models with advanced retrieval techniques for high-quality biomedical scientific evidence, CER effectively mitigates the risk of hallucinations, ensuring that generated outputs are grounded in verifiable, evidence-based sources. Evaluations on expert-annotated datasets (HealthFC, BioASQ-7b, SciFact) demonstrate state-of-the-art performance and promising cross-dataset generalization. Code and data are released for transparency and reproducibility: https: //github.com/PRAISELab-PicusLab/CER.

CLApr 22
Can "AI" Be a Doctor? A Study of Empathy, Readability, and Alignment in Clinical LLMs

Mariano Barone, Francesco Di Serio, Roberto Moio et al.

Large Language Models (LLMs) are increasingly deployed in healthcare, yet their communicative alignment with clinical standards remains insufficiently quantified. We conduct a multidimensional evaluation of general-purpose and domain-specialized LLMs across structured medical explanations and real-world physician-patient interactions, analyzing semantic fidelity, readability, and affective resonance. Baseline models amplify affective polarity relative to physicians (Very Negative: 43.14-45.10% vs. 37.25%) and, in larger architectures such as GPT-5 and Claude, produce substantially higher linguistic complexity (FKGL up to 16.91-17.60 vs. 11.47-12.50 in physician-authored responses). Empathy-oriented prompting reduces extreme negativity and lowers grade-level complexity (up to -6.87 FKGL points for GPT-5) but does not significantly increase semantic fidelity. Collaborative rewriting yields the strongest overall alignment. Rephrase configurations achieve the highest semantic similarity to physician answers (up to mean = 0.93) while consistently improving readability and reducing affective extremity. Dual stakeholder evaluation shows that no model surpasses physicians on epistemic criteria, whereas patients consistently prefer rewritten variants for clarity and emotional tone. These findings suggest that LLMs function most effectively as collaborative communication enhancers rather than replacements for clinical expertise.

GEO-PHMar 6, 2021
A novel approach to the classification of terrestrial drainage networks based on deep learning and preliminary results on Solar System bodies

Carlo Donadio, Massimo Brescia, Alessia Riccardo et al.

Several approaches were proposed to describe the geomorphology of drainage networks and the abiotic/biotic factors determining their morphology. There is an intrinsic complexity of the explicit qualification of the morphological variations in response to various types of control factors and the difficulty of expressing the cause-effect links. Traditional methods of drainage network classification are based on the manual extraction of key characteristics, then applied as pattern recognition schemes. These approaches, however, have low predictive and uniform ability. We present a different approach, based on the data-driven supervised learning by images, extended also to extraterrestrial cases. With deep learning models, the extraction and classification phase is integrated within a more objective, analytical, and automatic framework. Despite the initial difficulties, due to the small number of training images available, and the similarity between the different shapes of the drainage samples, we obtained successful results, concluding that deep learning is a valid way for data exploration in geomorphology and related fields.

IMMay 25, 2015
Machine learning based data mining for Milky Way filamentary structures reconstruction

Giuseppe Riccio, Stefano Cavuoti, Eugenio Schisano et al.

We present an innovative method called FilExSeC (Filaments Extraction, Selection and Classification), a data mining tool developed to investigate the possibility to refine and optimize the shape reconstruction of filamentary structures detected with a consolidated method based on the flux derivative analysis, through the column-density maps computed from Herschel infrared Galactic Plane Survey (Hi-GAL) observations of the Galactic plane. The present methodology is based on a feature extraction module followed by a machine learning model (Random Forest) dedicated to select features and to classify the pixels of the input images. From tests on both simulations and real observations the method appears reliable and robust with respect to the variability of shape and distribution of filaments. In the cases of highly defined filament structures, the presented method is able to bridge the gaps among the detected fragments, thus improving their shape reconstruction. From a preliminary "a posteriori" analysis of derived filament physical parameters, the method appears potentially able to add a sufficient contribution to complete and refine the filament reconstruction.

MED-PHJan 16, 2015
Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation

Sabina Tangaro, Nicola Amoroso, Massimo Brescia et al.

Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic Resonance Imaging (MRI) scans can show these variations and therefore be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach, for each voxel a number of local features were calculated. In this paper we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) Sequential Forward Selection and (iii) Sequential Backward Elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects. The resulting segmentations were compared with manual reference labelling. By using only 23 features for each voxel (sequential backward elimination) we obtained comparable state of-the-art performances with respect to the standard tool FreeSurfer.